The theoretical maximum compression ratio is different for each data set, because it depends upon the actual data contained in the data set. The CA Compress system provides multiple algorithms for compressing data, which you can selectively apply to individual fields within data records to maximize their compression.
Using RDL specifications, which you code and supply as input to build the File Descriptor Table (FDT), accomplishes this compression. The more completely and accurately the RDL describes the records, the closer the compression ratio approaches the theoretical maximum. However, each RDL specification coded has an associated cost in processing overhead.
For example, you know that a certain field contains textual data, such as a customer name and that the customer name field never contains numeric characters. This characteristic can be used to differentiate this field from a customer address field, which, while containing textual information, does contain numeric characters. By using 2 different RDL specifications to define these fields, you may achieve a higher compression ratio for both fields than if both fields are defined by the same RDL specification.
While it is often sufficient to know what kind of data is in a field, it is also helpful to know the distribution of values contained in the field across the file. For example, one of the most efficient ways to define a field to CA Compress is as a small set of fixed expected values. A file may contain a warehouse name field, where there is only a small number of warehouses represented on the file. An RDL specification can be coded which provides these names as a set of expected values.
| Copyright © 2012 CA. All rights reserved. |
|