Previous Topic: Defining PartitioningNext Topic: Additional Data Modeling Topics


Normalization

Normalization is a step-by-step technique that is used to ensure that the analyst has assigned each attribute to the proper entity type and defined sufficient entity types. The technique of normalization involves refining an entity type that is based on the interdependencies of its attributes.

The principles of normalization can be used throughout data analysis as each attribute is added to the model, and as a final confirmation of the model. Normalization can also be employed as needed for inspecting current system data and for defining entity types that data represents.

Each step of the technique results in the conformance of the entity type to a normal form. For the purposes of confirmation, it is sufficient to normalize to the third normal form (abbreviated 3NF).

C. J. Date (An Introduction to Database Systems, Vol. I, Addison Wesley.) describes higher normal forms.

The following table shows the three normal forms of normalization with the steps that cause an entity type to conform to these forms.

Normal Form

Description

First Normal Form (1NF)

Multi-valued attributes are removed to form a separate entity type. Thus, no entity type in 1NF can have repeating groups (multi-valued attributes).

Second Normal Form (2NF)

Attributes that are not fully dependent on the identifiers of the group are removed to a separate entity type.

Third Normal Form (3NF)

Attributes that are dependent on attributes, other than the identifiers are removed to a separate entity type.

The steps of normalizing data are best performed by examining example values of data that is found in the business, or developed with the help of users to illustrate the combinations of data values that arise.

The following illustration shows a group of attribute values that are laid out as columns in a table. Separate line items of a possible entity type ORDER appear as lines in the table.

A group of attribute values laid out as columns in a table.

The next three illustrations show the normalization of an attribute group. Normalization involves dividing the original entity type into implied entity types. Normalization also involves assigning attributes to the correct entity types; later this is useful for data design. In this case the table eventually breaks down into four implied entity types: PRODUCT, ORDER ITEM, ORDER, and CUSTOMER.

The following illustration shows an attribute group in first normal form (1NF) with sample data values.

First normal form

In second normal form (2NF) one level of multivalued groups of attributes are removed to the ORDER ITEM entity type to form a separate entity type as shown in the following illustration.

Second normal form

Product Code appears as the identifier of PRODUCT and as the foreign identifier of ORDER ITEM (1NF).

In third normal form (3NF), attributes that are not fully dependent on the identifier of the entity type are removed to additional entity types until no partial dependencies remain.

PRODUCT is already in 3NF because its attributes are dependent on the identifier Product Code. However, the attributes of ORDER ITEM are not fully dependent on Product Code. Removing these attributes to other entity types (ORDER then CUSTOMER) must be done to remove partial dependencies.

The following illustration shows the attribute group in third normal form (3NF).

Third normal form