Previous Topic: Organizing Your InputNext Topic: Separate Predicates


Search Strategies

Search queries and driving cursors, which are large queries that provide the input data to a batch process, can be expensive. How you use queries presents a trade-off between the amount of program code you are willing to write and the performance of your application.

If you code a generic search query, you get generic performance. In the following example, the SELECT statement basically supports a direct read, a range read, and a restart read in one statement. To enable this type of generic access, you have to code a generic predicate. In most cases, this means that for every SQL statement issued more data will be read than is needed because DB2 has a limited ability to match these types of predicates.

In the following statement, the predicate supports a direct read, sequential read, and restart read for at least one part of a three-part compound key:

WHERE COL1 = WS-COL1-MIN                               
AND   ((    COL2 >= WS-COL2-MIN                       
        AND COL2 <=  WS-COL2-MAX                        
        AND COL3 >= WS-COL3-MIN                       
        AND COL3 <= WS-COL3-MAX)                     
       OR                                              
        (      COL2 > WS-COL2-MIN                   
         AND   COL2 <= WS-COL2-MAX ))    
OR    (    COL1 > WS-COL1-MIN               
       AND COL1 <= WS-COL1-MAX )

These predicates are flexible; however, they are not the best performing. The predicate in this example most likely results in a non-matching index scan even though an index on COL1, COL2, COL3 is available, which means that the entire index will have to be searched each time that the query is executed. This is not a bad access path for a batch cursor that is reading an entire table in a particular order. For any other query, however, it is problematic. This is especially true for online queries that are actually providing three columns of data (all min and max values are equal). For larger tables, the CPU and elapsed time that is consumed can be significant.