The example BinaryTextorConfig.xml file specifies three file formats:
First, the configuration file instructs the Binary Text Extractor to detect executables and .DLLs. These files all start with an ASCII magic number of "MZ".
In this example, two <Encoding> elements specify that the Binary Text Extractor extracts any ASCII or Unicode text string of 6 characters or longer. The <CharSet> element specifies the range of characters that are eligible for extraction. This example simply excludes the non-printing characters. CA DataMinder then applies policy to the extracted text.
Second, the configuration file instructs the Binary Text Extractor to detect .AFF files. The magic number for these files comprises an ASCII string "AFF Rev." plus an eight character hexadecimal number. The ASCII string and hexadecimal string do not occur concurrently. The configuration file uses two <MagicNumber> elements to define the magic number format.
The Encoding element specifies that the Binary Text Extractor extracts any ASCII text string of 5 characters or longer from these .AFF files. As above, the <CharSet> element excludes strings containing non-printing characters. CA DataMinder then applies policy to the extracted text.
Third, the configuration file instructs the Binary Text Extractor to detect .DLIS files. The magic number for these files comprises an ASCII string in the format "V?.??RECORD". You cannot use wildcards to specify magic numbers. The configuration file therefore uses a combination of three <MagicNumber> elements to define the magic number format.
The Encoding element specifies that the Binary Text Extractor extracts any ASCII text string of 5 characters or longer from these .DLIS files. As above, the <CharSet> element excludes strings containing non-printing characters. CA DataMinder then applies policy to the extracted text.
Example Configuration File
<?xml version="1.0" encoding="utf-8" ?> <UniversalBinaryTextor> <!-- Executable "MZ" --> <FileType name="Executable/DLL"> <MagicNumber value="MZ" type="ascii-string" offSet="0" /> <Encoding name="ASCII" minLength="6"> <CharSet start="0x20" end="0x7E" /> </Encoding> <Encoding name="UTF16_LITTLEENDIAN" minLength="6"> <CharSet start="0x20" end="0xFF" /> </Encoding> </FileType> <!-- AFF Rev. X.6 --> <FileType name="Advanced File Format"> <MagicNumber value="AFF Rev." type="ascii-string" offSet="56" /> <MagicNumber value="FFFFFFFF" type="hex-string" offSet="240" /> <Encoding name="ASCII" minLength="5"> <CharSet start="0x20" end="0x7E" /> </Encoding> </FileType> <!-- Digital Log Interchange Standard (DLIS) http://w3.energistics.org/rp66/v1/Toc/main.html --> <FileType name="DLIS"> <MagicNumber value="RECORD" type="ascii-string" offSet="9" /> <MagicNumber value="V" type="ascii-string" offSet="4" /> <MagicNumber value="." type="ascii-string" offSet="6" /> <Encoding name="ASCII" minLength="5"> <CharSet start="0x20" end="0x07F" /> </Encoding> </FileType> </UniversalBinaryTextor>
Note: See the following Schema Notes for guidelines on writing a custom BTE configuration file.
Copyright © 2014 CA.
All rights reserved.
|
|