Previous Topic: Non-UTF-8 Locale Support and LocalizationNext Topic: Localization and UTF-8 Encoding


UTF-8 and MBCS Encoding

UTF-8 (8-bit Unicode Transformation Format) is a way of encoding characters so that every possible character can be represented using a variable number of bytes. On UNIX, it is treated like any other multi-byte character set and is backwards compatible with the ASCII character set.

CA ITCM code on Linux and UNIX is generally operating in a UTF-8 locale. This causes problems interfacing with the operating system, if the operating system is using a non-UTF-8 locale. All code that interfaces with the operating system, such as file names, command line parameters, and so on is converting in between the system MBCS locale and UTF-8.

A multi-byte character set (MBCS) uses 1 or 2 bytes per character and is used for character sets that contain large numbers of different characters (for example, Asian language character sets).