Histogram statistics summarize data distribution on an interval scale and span the entire distribution of values in a table.
Histogram statistics divide values into quantiles. The quantile defines an interval of values. The quantity of intervals is determined using a specification of quantiles in a RUNSTATS execution. Thus, a quantile represents a range of values within the table. Each quantile contains approximately the same percentage of rows. This percentage can be more than the distribution statistics in that all values are represented.
Histogram statistics improve on distribution statistics because they provide value-distribution statistics that are collected over the entire range of values in a table. Collecting statistics over the entire table goes beyond the capabilities of distribution statistics because distribution statistics are limited to only the most or least occurring values in a table.
To query on the middle initial of a person's name, the initials are most likely in the range of A to Z. If you used only cardinality statistics, any predicate referencing the middle initial column would receive a filter factor of 1/COLARDF or 1/26. If you have used distribution statistics, any predicate that used a literal value could take advantage of the distribution of values to determine the best access path. That determination would depend on the value being recorded as one of the most or least occurring values. If it is not, the filter factor is determined based on the difference in frequency of the remaining values. Histogram statistics eliminate this guesswork.
If histogram statistics are calculated for the middle initial, they would appear similar to the following values:
QUANTILE |
LOWVALUE |
HIGHVALUE |
CARDF |
FREQ |
1 |
A |
G |
5080 |
20% |
2 |
H |
L |
4997 |
19% |
3 |
M |
Q |
5001 |
20% |
4 |
R |
U |
4900 |
19% |
5 |
V |
Z |
5100 |
20% |
The DB2 optimizer has information about the entire span of values in the table and any possible skew. This information is especially useful for the following range predicates:
SELECT EMPNO FROM EMP WHERE MIDINIT BETWEEN 'A' and 'G'
Copyright © 2014 CA Technologies.
All rights reserved.
|
|