Previous Topic: CONFIG Member

Next Topic: XCOMRRDS Transfer Queue Startup Option


Unicode and Multi-Byte Character Set Support for Data Transfer

Before the advent of Unicode, a significant number of character sets were devised to permit the representation of symbols used in the Chinese, Japanese, Korean, and Taiwanese (CJK) languages. Today, Unicode is favored and there is an ongoing transition from these legacy character sets to Unicode encodings, most notably UTF-8 and UTF-16.

Many CJK legacy multibyte character sets are ASCII based, as is the case for the most commonly used Unicode encodings (i.e.UTF-8, UTF-16).

In the IBM mainframe (predominantly EBCDIC) world however composite character sets are commonly employed, involving a Shift-in/Shift-out encoding method. This encoding mechanism enables a single-byte ASCII or EBCDIC character-set to be used for the representation of Latin characters, in tandem with a multibyte character set for the representation of non-Latin characters. Shift-in and shift-out control characters are then inserted in the data stream to signal a switch between the two embedded character sets. The CCSID 937 character set combines an EBCDIC single byte character-set with a Traditional Chinese multibyte character set. While the CCSID 938 character set combines an ASCII single byte character-set with the same Traditional Chinese multibyte character set.

CA XCOM Data Transport currently performs data transfers utilizing one of three data formats – ASCII, EBCDIC, or Binary.

This enhancement allows for transmission of text files that are encoded using multi-byte character sets, including in-flight conversion of data between different character sets. Two additional data formats can be specified for the CODE parameter to allow for transmission of these files. In addition, new parameters have been added to the CA XCOM Data Transport configuration member, destination member and SYSIN01 parameters. These parameters allow you to specify the local and remote character sets to be used for file data conversion and actions for dealing with unconvertible characters.

CA XCOM Data Transport is utilizing IBM Unicode Services to perform data conversion functions. For information on IBM Unicode Services, please refer to the IBM manual Unicode Services User’s Guide and Reference, SA22-7649. It can be found on the IBM website.

The CODE SYSIN01 parameter allows for two new data formats – UTF8 and UTF16. When one of these formats is specified for a transfer, data is converted to that format for transmission to the remote partner.

The LOCAL_CHARSET and REMOTE_CHARSET SYSIN01 parameters specify the character-set of the local and remote files for the transfer. These parameters are used in conjunction with CODE=UTF8 or CODE=UTF16 to perform the conversion of data. If not specified for the transfer, they default to the value specified for the DEFAULT_CHARSET parameter in the destination or configuration member.

In order to handle conversion issues between character sets, additional parameters MBCS_INPUTERROR and MBCS_CONVERROR specify what action is taken in the event of a character being encountered cannot be converted. The sending partner uses MBCS_INPUTERROR and specifies to either replace the character with the replacement character defined in the IBM conversion table or fail the transfer. The receiving partner uses MBCS_CONVERROR and specifies to either replace the character with the replacement character defined in the IBM conversion table or fail the transfer. If not specified the value of DEFAULT_INPUTERROR and DEFAULT_CONVERROR parameters in the destination or configuration member will be used.

For USS files, SYSIN01 parameters LOCAL_DELIM and REMOTE_DELIM specify the encoding scheme that the corresponding character-set uses and a list of delimiters which exists within the data as record separators.

For receive requests where the target character set is different from the source character set, the LRECL needs to have a value specified which allows for the difference in the number of bytes per character. If the LRECL is not large enough to support the target character set, an XCOMM0144E SENDING RECLEN > MAX TARGET LENGTH error is issued.

For SEND JOB requests to a z/OS partner, the remote CCSID (REMOTE_CHARSET) is required to be EBCDIC based as JES is unable to process non-EBCDIC characters.

The use of truncation (TRUNCATE=YES) is not supported for Unicode transfers. This is due to the possibility of data loss or corruption should truncation occur in the middle of a multi-byte character in the file.

This section contains the following topics:

New Parameters

History Record Detail Screen

New and Changed Messages

New Parameters

This section describes the new PARM and Configuration parameter values the CA XCOM Data Transport Server allows and XCOMJOB TYPE=EXECUTE jobs for Unicode support.

DEFAULT_CHARSET

Specify the CCSID to use for files when a character set is not specified in the transfer parameters for the local or remote file. Optionally specify the search order of conversion techniques to use for data conversion by the IBM Unicode Services. The default CCSID is 37 for US EBCDIC.

DEFAULT_DELIM

Specify the encoding scheme of the data and record delimiters to use for character conversion to and from Unicode format. The encoding scheme is relevant for USS files being transferred. The default value is EBCDIC:NL, which indicates EBCDIC data encoding and new line character for record separators.

DEFAULT_CONVERROR

Specify the action for IBM Unicode Services to take when a character encountered for conversion by the receiving partner is not included within the output character sets character repertoire. The only supported options are REPLACE and FAIL, with FAIL being the default. If FAIL is specified, the transfer fails. The transfer also fails when an unconvertible character is detected.

DEFAULT_INPUTERROR

Specifies the action for IBM Unicode Services to take when a character encountered for conversion by the sending partner is not consistent with the specified input character set. The only supported options are REPLACE and FAIL, with FAIL being the default. If FAIL is specified, the transfer fails. The transfer also fails when an unconvertible character is detected.

History Record Detail Screen

The History Record Detail screen provides information on the set of file transfers that are defined on the File Transfer Display Select screen. Additional information for Unicode transfers is presented.

CA XCOM SEND FILE REQ.# 001302 QUEUED MONDAY AUG. 01, 2011 09:32:15

COMMAND INPUT ===> Local System Identification Server: 1234XCOM Port: 8040 Protocol: TCP History System ID: XC12 History System Name: XCM123 Invoking Job: USER01 Sched. Start Time: MONDAY AUG. 01 2011 09:32:19 Transfer ID: USER01234 End Time: MONDAY AUG. 01 2011 09:32:19 Encoding : UTF-8 Last Action: * NOT USED* Status: COMPLETED Priority Sel: 016 Exec: 016 Compress Mode: RLE Trans. Time (Secs): 1 Compress Factr: 02.0 Transfrd. Records: 1 Bytes: 48 Bytes/Sec: 1 Compress Bytes: 49 CPU: Time: 216,789 zIIP: Elig: 47,635 (ms) TCB: 169,010 (ms) zIIP: 45,685 SRB: 47,779 CPU: 1,949 Charset Input Error : REPLACE Replace Count: 0 Charset Convert Error: REPLACE Replace Count: 0 Last Ms: XCOMM0137I 1 RECORDS SENT SUCCESSFULLY - FILE=XCOM.USERID01.MG ‑‑‑‑‑‑‑‑‑‑‑ S E N D I N G S Y S T E M I N F O R M A T I O N ‑‑‑‑‑‑‑‑‑‑-- System ID: *LOCAL* User ID: USER01 Notify ID: N/A Unit: Volume: File Type: FLAT FILE File Name: XCOM.USER01.MGTCENT.TXT Charset : CCSID#37/RE Rec Delim: EBCDIC:NA ‑‑‑‑‑‑‑‑‑‑‑ R E C E I V I N G S Y S T E M I N F O R M A T I O N ‑‑‑‑---- System ID: USER01-LAPTOP User ID: user01 Notify ID: user01 Unit: Volume: File Type: REPLACE File Name: c:\temp\mgtcent.txt Charset : CCSID#850/ML Rec Delim: EBCDIC:CRLF:NL F1=Help F2=SPLIT F3=End F4=RETURN F5=RFIND F6=RCHANGE F7=UP F8=DOWN F9=SWAP F10=Unicode F11=Hold F12=Alloc

The following fields have been added to this screen:

Encoding

The encoding scheme that is used for the data transfer.

Charset Input Error & Replace Count

For transfers using Unicode encoding scheme, specifies the appropriate action when the input file contains data that is not consistent with the specified input character set. The replace count is the number of characters for which the action was taken. For transfers on z/OS systems, the count is the number of data buffers for which the action was taken.

Charset Convert Error & Replace Count

For transfers using Unicode encoding scheme, specifies the action when the input file contains characters that cannot be converted. The characters are not included within the output character sets character repertoire. The replace count is the number of characters for which the action was taken. For transfers on z/OS systems, the count is the number of data buffers for which the action was taken.

Charset

Specifies the character set of the data.

Rec Delimiters

Specifies the encoding scheme for the character set and a set of possible delimiters to use for file processing. This parameter applies only to USS files on z/OS and files on the Linux/Unix/Windows (LUW) platforms.

New and Changed Messages

This section describes the new and changed messages to support this enhancement.

New messages for Unicode transfers are XCOMM0441I, XCOMKM0442I, XCOMM0898I, XCOMM0899I and XCOMM0900E.

0441I

cccccc CHARSET= xxxxxxxxxxxxxxx

Reason:

When listing the destination member, the default Local and Remote character set information is displayed.

Action:

No action is required.

0442I

cccccc DELIMITERS= xxxxxxxxxxxxxxx

Reason:

When listing the destination member, the default Local and Remote Encoding and Delimiter information is displayed.

Action:

No action is required.

0899I

UNICODE CONVERSION DETECTED xxxxxxxxxxxxx CHARACTERS IN THE SOURCE DATA

Reason:

When performing character set conversion, data in the source file could not be converted.

When UNCONVERTABLE CHARACTERS is displayed, the source data contains one of the following issues: a character that does not have an equivalent character in the target code page or the substitution character that is defined for the conversion table.

When MALFORMED CHARACTERS is displayed, the source data contains byte strings that do not represent a valid character in the source code page.

Action:

No action is required.

When UNCONVERTABLE CHARACTERS is displayed, If MBCS_INPUTERROR is specified as FAIL, the transfer is terminated. Otherwise the replacement character defined in the conversion table IBM Unicode Services uses is used to replace the character in the target data.

When MALFORMED CHARACTERS is displayed, If MBCS_CONVERROR is specified as FAIL, the transfer is terminated. Otherwise the replacement character defined in the conversion table IBM Unicode Services uses is used to replace the malformed character in the target data.

0900E

UNICODE CONVERSION ERROR - RC=XX REASON=XX – TRANSFER TERMINATED

Reason:

An error condition occurred in IBM Unicode Services performing character conversion.

Action:

Determine the cause of the problem by using the return code and reason code. These codes are documented in the IBM z/OS Unicode Services User Guide and Reference manual.

0901E

PACK=LENGTH HAS BEEN SET FOR A UNICODE TRANSFER

Reason:

When processing a transfer for data in Unicode format, PACK=LENGTH is forced to improve CPU performance for character set conversion processing.

Action:

No action is required.

0902E

A VALID DELIMITER WAS NOT SPECIFIED FOR PATH=xxxxxxxxxxxxxxxxxxxxxxxxxx

Reason:

A valid delimiter for the end of the record was not specified for the USS file, preventing the Unicode transfer from completing.

Action:

Rerun the transfer specifying a valid delimiter.

0903E

BPX xxxxxxx ERROR xxxxxxxxx PATH=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Reason:

The specified BPX function for a Unicode(CODE=UTF8 or CODE=UTF16) transfer has failed processing a USS file. The file cannot be processed.

Action:

No action is required.

0907E

XXXXXXXXXXXXXXX MUTUALLY EXCLUSIVE DELIMITER OPTIONS SPECIFIED

Reason:

The delimiter options for either the LOCAL_DELIM or REMOTE_DELIM parameters specify options which are mutually exclusive.

Action:

Modify the delimiter options to remove any mutually exclusive options, which are defined in the CA XCOM Data Transport Users Guide under the LOCAL_DELIM and REMOTE_DELIM SYSIN01 parameters.

0932E

TRUNCATION IS NOT SUPPORTED FOR UNICODE TRANSFERS

Reason:

Using truncation for Unicode data can result in data loss or corruption if truncation occurs in the middle of a MBCS character.

Action:

Initiate the transfer without using truncation. Insure that a large enough logical record length (LRECL) is specified that can hold Unicode data which may have a larger LRECL than the source data.