A table reference compression technique can be used where the set of expected values contained in a field across the file is small, and these values are known. It is not necessary to include all values that occur in the field in the table of expected values. If data is encountered in the field for which no matching table entry is specified, the data is not compressed; instead, it grows by one bit. No data is lost in this case, and the expanded field is identical to the field before compression. A message is printed with the statistics produced by the Compression Utility, indicating the number of times a value was encountered in a type S or X field that was not specified in the table of expected values. To achieve efficient compression, it is important that most values occurring in a field defined as type S or X are specified in the table of expected values.
Field types S and X are functionally equivalent. The only difference is how to specify the table of expected values. The table for type S is coded in EBCDIC characters. Any of the 256 possible byte values can be coded, but nongraphic data must be multipunched. The table for type X is coded in hexadecimal format. Each byte value is coded as 2 hexadecimal digits.
RDL specifications for field types S and X are coded in a special format:
tmnv
|
Parameter |
Description |
|---|---|
|
t |
The field type specification, either S or X. |
|
m |
A 2-digit number (01<m<99—code the leading zero for values between 01 and 09), indicating the field length. |
|
n |
A 2-digit number (01<n<16—code the leading zero for values between 01 and 09), indicating the number of entries in the table of expected values. |
|
v |
The table of expected values. In this table, entries are coded consecutively until n values are specified. For type S, no space can be left between consecutive entries. The table must occupy exactly m*n positions in the field definition. If the table specification continues past column 72 of the current RDL statement, it starts again in column 1 of the next RDL statement. For type X, spaces can appear between pairs of hexadecimal digits for readability, but the table must contain exactly 2*m*n hexadecimal digits. |
CA Compress uses a sequential search algorithm to determine if a field value in the record appears in the table of expected values. For maximum efficiency in processing overhead, the entries of the table should be coded in decreasing order of probability of occurrence. Code first the expected value most likely to occur first; code last the expected value least likely to occur.
The only limit on the size of an expected value table, other than 99 entries maximum, is the total space available in the FDT. Where applicable, this is the most efficient method, both in terms of compression ratio and processing overhead.
To show correctly coded type S and X field definitions, and to show the difference in the way expected values are coded between types S and X, consider the following definitions, which are equivalent:
S0103AB1 X0103C1C2F1
Suppose a file has a 4-byte field containing DOGb/, CATb/, FISH, BIRD, FROG, or other, where other occurs infrequently. If this field is specified as:
S0405DOGbCATbFISHBIRDFROG
the 32-bit (that is, 4-byte) field compresses to 4 bits, one bit as an error flag and 3 bits to represent which of the 5 values occur. An other field cannot be compressed, but grows by one bit to 33 bits.
| Copyright © 2012 CA. All rights reserved. |
|