Denormalization "Light"

Designing Tables and Indexes for Performance › Table Designs for Special Situations › Denormalization "Light"

Denormalization "Light"

In many situations, especially those in which there is a conversion from a legacy flat file-based system to a relational database, there is a performance concern (or more importantly a performance problem in that an SLA is not being met) for reading the multiple DB2 tables. These are situations in which the application is expecting to read all of the data that was once represented by a single record, but it is now in many DB2 tables.

In these situations, a typical response is to begin denormalizing the tables.

Denormalizing tables counteracts all the advantages of moving your data into DB2; that is, efficiency, portability, flexibility, and faster time to delivery for your new applications.

In some situations, however, the performance issues resulting from reading multiple tables that are compared to the equivalent single record read are unacceptable. In these instances, you can consider implementing denormalization "light" instead. This type of denormalization can be applied to parent and child tables, when the child table data is in an optional relationship to the parent. Instead of denormalizing the optional child table data into the parent table, add a column to the parent table. This configuration indicates whether the child table has any data for that parent key.

Note: If you apply this method, ensure that you factor in maintenance of that indicator column. However, DB2 can use a during join predicate to avoid probing the child table when there is no data for the parent key.

For example, assume that you have an account table and an account history table. The account may or may not have account history, and so the following query would join the two tables together to list the basic account information (balance) along with the history information if present:

SELECT A.CURR_BAL, B.DTE, B.AMOUNT
FROM   ACCOUNT A
LEFT OUTER JOIN
       ACCT_HIST B
ON     A.ACCT_ID = B.ACCT_ID
ORDER BY B.DTE DESC

In this example, the query will always probe the account history table in support of the join, whether or not the account history table has data. You can employ denormalization "light" by adding an indicator column to the account table. You can then use a during join predicate. DB2 will perform the join operation only when the join condition is true. In this case, the access to the account history table is avoided when the indicator column has a value not equal to Y:

SELECT A.CURR_BAL, B.DTE, B.AMOUNT
FROM   ACCOUNT A
LEFT OUTER JOIN
       ACCT_HIST B
ON     A.ACCT_ID = B.ACCT_ID
AND    A.HIST_IND = 'Y'
ORDER BY B.DTE DESC

DB2 is going to test that indicator column first before performing the join operation, and supply nulls for the account history table when data is not present as indicated.

This type of design provides significant benefits when you are doing a legacy migration from a single record system to around 40 relational tables with lots of optional relationships. This form of denormalizing can improve performance in support of legacy system access, while maintaining the relation design for efficient future applications.