Unicode and Multi-Byte Character Set Support for Data Transfer

Before the advent of Unicode, a significant number of character sets were devised to permit the representation of symbols used in the Chinese, Japanese, Korean, and Taiwanese (CJK) languages. Today, Unicode is favored and there is an ongoing transition from these legacy character sets to Unicode encodings, most notably UTF-8 and UTF-16.

Many CJK legacy multibyte character sets are ASCII based, as is the case for the most commonly used Unicode encodings (as an example, UTF-8, UTF-16).

In the IBM mainframe (predominantly EBCDIC) world however composite character sets are commonly employed, involving a Shift-in/Shift-out encoding method. This encoding mechanism enables a single-byte ASCII or EBCDIC character-set to be used for the representation of Latin characters, in tandem with a multibyte character set for the representation of non-Latin characters. Shift-in and shift-out control characters are then inserted in the data stream to signal a switch between the two embedded character sets. The CCSID 937 character set combines an EBCDIC single byte character-set with a Traditional Chinese multibyte character set. While the CCSID 938 character set combines an ASCII single byte character-set with the same Traditional Chinese multibyte character set.

CA XCOM Data Transport currently performs data transfers utilizing one of three data formats – ASCII, EBCDIC, or Binary.

This enhancement allows for transmission of text files that are encoded using multi-byte character sets, including in-flight conversion of data between different character sets. Two additional data formats can be specified for the CODE_FLAG parameter to allow for transmission of these files. In addition, new parameters have been added to the CA XCOM Data Transport global parameters and configuration parameters. These parameters allow you to specify the local and remote character sets to be used for file data conversion and actions for dealing with unconvertible characters.

CA XCOM Data Transport is utilizing the ICU (International Components for Unicode) toolkit to perform data conversion functions. For information on the ICU toolkit, please refer to the ICU website http://site.icu-project.org/.

The CODE_FLAG parameter allows for two new data formats – UTF8 and UTF16. When one of these formats is specified for a transfer, data is converted to that format for transmission to the remote partner.

The LOCAL_CHARSET and REMOTE_CHARSET parameters specify the character-set of the local and remote files for the transfer. These parameters are used in conjunction with CODE_FLAG=UTF8 or CODE_FLAG=UTF16 to perform the conversion of data. If not specified for the transfer, they default to the value specified for the DEFAULT_CHARSET global parameter.

In order to handle conversion issues between character sets, additional parameters MBCS_INPUTERROR and MBCS_CONVERROR specify what action is taken in the event of a character being encountered that cannot be converted. The sending partner uses MBCS_INPUTERROR and specifies to either replace the character with a replacement character or fail the transfer. The receiving partner uses MBCS_CONVERROR and specifies to either replace the character with a replacement character or fail the transfer. If not specified the value of DEFAULT_INPUTERROR and DEFAULT_CONVERROR global parameters will be used.

Parameters LOCAL_DELIM and REMOTE_DELIM specify the encoding scheme that the corresponding character-set uses and a list of delimiters which exists within the data as record separators.

This section contains the following topics:

New Global Parameters

New Configuration Parameters

Edit Transfer Record Screen

Detail History Record Screen

Global Parameters Screen

New and Changed Messages