Analyzing KSDS Key-Sequenced Data Sets

Reports › CIA Report Description › Analyzing KSDS Key-Sequenced Data Sets

Analyzing KSDS Key-Sequenced Data Sets

Key-sequenced data sets are the hardest of the cluster types to tune properly because of all the variables involved. Since the parameters governing tuning are installation- and data set- dependent, it would be impractical (if not impossible) to list an exhaustive set of criteria to follow. We will try to list the more important values to check.

As with the other cluster types, check the number of extents the data set occupies. Consolidate extents with an archive and restore to reduce volume fragmentation. (Archiving the data set to disk and then restoring it will avoid operator intervention to mount archive tapes!)
As with entry-sequenced data sets, see if the data set has any spanned records. If it does, consider increasing the CI size of the data component if practical. Again, be careful with IMS and CICS databases — you can be limited as to the buffer space that can be used. Also, if the cluster is accessed primarily in an online environment, the spanned records can still be a proper choice because of the extra overhead involved in transferring larger control intervals.
Check the index's control interval size. Many clusters get defined with a CI size that is too large for the index. If the CI size is greater than 512, verify that the index is indeed the proper size. This can be done in part by looking at the minimum freespace per CI value and at the average index entry length found in the index's statistics summary section. If the minimum freespace is less than the difference between the current and the next-lower valid CI size (valid CI sizes for the index are 512, 1024, 2048, and 4096), you are already at the lowest CI size. If this is not the case, multiply the average index entry length by the number of control intervals per control area in the data component (field CI/CA in the catalog information section). This will give you a rough idea of the space needed for indexing the average data control area. There is one important point to keep in mind: if you make the CI size too small, you can force VSAM to more index levels, which is undesirable in an online, direct access environment. There is an entire section devoted to optimizing VSAM's performance in IBM's VSAM Programmer's Guide, which discusses the impact of control interval size on performance. Before doing any tuning on clusters, this entire section should be read thoroughly.
Compare the CI size to the physical block size. The optimal situation is for the two to have the same value. If not, a different control interval size can be in order. Again, this cannot be an option with IMS and CICS databases. Consult the VSAM Programmer's Guide for a table showing the correspondence of CI size to physical block size for space utilization guidelines (in the section on optimizing VSAM's performance).
For clusters used extensively in sequential access mode (as opposed to direct key), check the average control area deviation value. In a data set with a large number of CA splits, the seek movement can contribute significant overhead. You can want to consider reorganizing the cluster, and can be also increasing the freespace percentages at restore time to reduce the frequency of splits in the future.
Look at the unreferenceable CI count. If it is other than zero, you can be wasting DASD space. It is impossible to tell from just the statistics summary if the space is currently unusable or if it is a projection of future problems. Remember that CA Disk simulates the process that VSAM uses when allocating new control intervals from the freespace pointers. Therefore, there can indeed be enough freespace for current usage, but CA Disk is predicting a problem with future additions. If you want to know if the problem exists currently, look at the sequence set graph that is produced. Under the UNREF CIs heading you will see the sequence set records that have unreferenceable CIs. If the FREESPACE value for the control interval is less than the AVG ENTL, the space problem exists right now.
Look at the dead space percentage. If a high percentage of your data set contains dead space, you should consider going to a higher data control interval size. As a rule, DASD utilization is more efficient at the higher CI sizes. Also, if spanned records are present in the data set, they can be contributing significantly to the problem. Again, consider increasing the CI size.
Look at the space usage percentage and estimated record additions values. For data sets that are continually expanding, these numbers will give an indication that reorganization with additional space can be in order. This can keep your data set from going into additional extents or, worse yet, running out of space.