Multi-Byte Character Set Search Limitations

Administering Knowledge Management › Search › KT Search Engine › Use the Knowledge Management Search › Parse Settings › Multi-Byte Character Set Search Limitations

Multi-Byte Character Set Search Limitations

Make sure that you understand the available parsing approaches, and limitations of MBCS languages, before implementing your Knowledge Management system to help ensure that user expectations are set appropriately. This limitation of the product impacts search features using Japanese, Chinese, or Korean language text within the system. The word parsing mechanism used by the search mechanism is controlled on the Parse Settings page.

For the English, Other European, and Korean settings, the product assumes that punctuation, “white space,” or both characters separates words. This assumption allows the document text to be broken into specific words, and allows noise words to be ignored and application of known synonyms and special terms to search terms.

Alternatively, when the Far East language setting is selected, the parsing routine uses a character-by-character parsing approach to accommodate some Far East language text approaches of not using white-space delimiters between words. This setting tells the parser to assume that each character is treated as a full word. The setting applies to all text to be searched. Because the language setting changes the way that the search parsing works, the entire search index must be recreated if the language setting is changed to or from Far East.

Tell Technical Publications how we can improve this information