Primary Key Selection

Choosing the primary key of an entity is an important step that requires some serious consideration. Before you actually select a primary key, you may need to consider several attributes, which are referred to as candidate key attributes. Typically, the business user who has knowledge of the business and business data can help identify candidate keys.

For example, to correctly use the EMPLOYEE entity in a data model (and later in a database), you must be able to uniquely identify instances. In the customer table, you could choose from several potential key attributes including: the employee name, a unique employee number assigned to each instance of EMPLOYEE, or a group of attributes, such as name and birth date.

The rules that you use to select a primary key from the list of all candidate keys are stringent and can be consistently applied across all types of databases and information. The rules state that the attribute or attribute group must:

Example:

Consider which attribute you would select as a primary key from the following list of candidate keys for an EMPLOYEE entity:

If you use the rules previously listed to find candidate keys for EMPLOYEE, you might compose the following analysis of each attribute:

After analysis, there are two candidate keys-one is “employee-number” and the other is the group of attributes containing “employee-name” and “employee-birth-date.” “employee-number” is selected as the primary key since it is the shortest and ensures uniqueness of instances.

When choosing the primary key for an entity, modelers often assign a surrogate key, an arbitrary number that is assigned to an instance to uniquely identify it within an entity. “employee-number” is an example of a surrogate key. A surrogate key is often the best choice for a primary key since it is short, can be accessed the fastest, and ensures unique identification of each instance. Further, a surrogate key can be automatically generated by the system so that numbering is sequential and does not include any gaps.

A primary key chosen for the logical model may not be the primary key needed to efficiently access the table in a physical model. The primary key can be changed to suit the needs and requirements of the physical model and database at any point.