Previous Topic: Range-Partitioned Table SpaceNext Topic: Free Space


Clustering and Partitioning

The clustering index is not always the primary key. It is generally a sequential range retrieval key, and should be chosen by the most frequent range access to the table data. Range and sequential retrieval are the primary requirements, but partitioning is an important requirement and can be the most critical requirement, especially as tables get extremely large. If you do not specify an explicit clustering index, DB2 clusters by the index that is the oldest by definition (often referred to as the first index created). If the oldest index is dropped and recreated, that index will now be a new index and clustering will now be by the next oldest index.

The basic rule to clustering is that if your application will have a certain sequential access pattern or a regular batch process, you should cluster the data according to that input sequence.

Clustering and partitioning can be independent, and a log of options is available for organizing your data as follows:

You should choose a partitioning strategy that is based on a concept of application-controlled parallelism, separating old and new data, grouping data by time, or grouping data by some meaningful business entity (for example, sales region or office location). Within those partitions, you can cluster the data by your most common sequential access sequence.

Note: For more information about dismissing clustering for inserts, see Append Processing for High Volume Inserts.

For large tables, partitioning is the only way to store large amounts of data, but partitioning also has advantages for smaller tables. Consider the following: