ENC API

Introduction

ENC API provides character code conversion features.

Supported Character Encodings

The internal character encoding of Revolution is UTF-16BE.

ENC API supports bi-directional conversion between the following encodings and the internal character encoding.

US-ASCII
UTF-8
UTF-16BE
UTF-32BE
ISO-8859-1
ISO-8859-2
ISO-8859-3
ISO-8859-7
ISO-8859-10
ISO-8859-15
ISO-2022-JP
Shift_JIS
UHC
GB2312
windows-1252

ENC API supports a one-way conversion from the following encodings to the internal character encoding.

UTF-7
UTF-16
UTF-16LE
windows-1250
windows-1253
macintosh
x-mac-ce
x-mac-greek
IBM850
IBM852

Character Encoding Name Matching

ENC API matches the character encoding names based on the following rules.

Convert all alphabets to lowercase.
If the name starts in "x-" or "cs," remove them.
Remove all non-alphabet and non-numeric characters.
Compare with the matching strings of individual character encoding names.

The individual character encoding names and the matching strings are as follows.

Character Encoding Name	Matching Strings
US-ASCII	usascii ascii us ansix341968 ansix341986 cp367 ibm367 iso646irv1991 iso646us isoir6
UTF-8	utf8 utf8n unicode11utf8 unicode20utf8
UTF-16BE	utf16be ucs2 ucs2be unicode11 unicode20 unicode20utf16 unicodeascii unicodelatin1 iso10646 iso10646j1 iso10646ucs2 iso10646ucs2be iso10646ucsbasic iso10646unicodelatin1
UTF-32BE	utf32be utf32 ucs4 ucs4be iso10646ucs4 iso10646ucs4be
ISO-8859-1	iso88591 latin1 l1 cp819 ibm819 isolatin1 iso885911987 isoir100
ISO-8859-2	iso88592 latin2 l2 isolatin2 iso885921987 isoir101
ISO-8859-3	iso88593 latin3 l3 isolatin3 iso885931988 isoir109
ISO-8859-7	iso88597 greek greek8 isolatingreek iso885971987 isoir126 ecma118 elot928 suneugreek
ISO-8859-10	iso885910 latin6 isolatin6 l6
ISO-8859-15	iso885915 latin9 iso8859101992 isoir157
ISO-2022-JP	iso2022jp iso2022jp1 iso2022jp2
Shift_JIS	shiftjis sjis mscp932 mskanji windows31j
UHC	euckr ksc56011987 isoir149 ksc56011989 ksc5601 korean uhc cp949 windows949
GB2312	gb2312 gb231280 isoir58 chinese iso58gb231280 euccn
windows-1252	windows1252 cp1252 windows30latin1 windows31latin1 iso88591windows30latin1 iso88591windows31latin1
Character Encoding Name	Matching Strings
UTF-7	utf7 unicode11utf7 unicode20utf7 cp65000
UTF-16	utf16 cp1200 ibm1200
UTF-16LE	utf16le ucs2le iso10646ucs2le
windows-1250	windows1250 cp1250 windows31latin2 iso88592windowslatin2
windows-1253	windows1253 cp1253
macintosh	macintosh mac macroman
x-mac-ce	macce
x-mac-greek	macgreek
IBM850	ibm850 cp850 850 pc850multilingual
IBM852	ibm852 cp852 852 pcp852

Conversion Rules of Individual Encodings

ISO-8859

ISO-8859 conversion involves conversion from ISO-8859 to the internal character encoding and its reverse conversion.
However, in ISO-8859-1 conversion, the encoding is treated as windows-1252 for conversion to the internal encoding, but treated as ISO-8859-1 for conversion from internal encoding.

Japanese Character Encoding

The ISO-2022-JP and Shift_JIS conversion rules are compatible with Windows conversion with certain exceptions.

When converting from the internal character encoding to ISO-2022-JP or Shift_JIS, the following conversions not in Windows are used.

Internal Character Encoding	ISO-2022-JP	Shift_JIS
0x203E	0x7E	0x7E
0x2014	0x213D	0x815C
0x2016	0x2142	0x8161
0x2212	0x215D	0x817C
0x301C	0x2141	0x8160

ISO-2022-JP supports the following character groups.

ASCII
JIS romaji
JIS X 0208-1983
Half-width kana

However, JIS romaji supports only one-way conversion from ISO-2022-JP and is treated the same as ASCII.
Half-width kana also only supports one-way conversion from ISO-2022-JP and is converted to full-width kana at conversion to ISO-2022-JP.

For internal character encoding, the private area (1880 characters) for ISO-2022-JP and Shift_JIS are defined in the ranges shown below, corresponding to the code order.
Conversion from ISO-2022-JP is possible in the private area, but conversion to ISO-2022-JP returns ENC_ERR_NO_MAP_RULE.

Internal Character Encoding	ISO-2022-JP	Shift_JIS
0xE000 - 0xE757E	0x7F21 - 0x927E	0xF040 - 0xF9FC

Korean Character Encoding

UHC (CP949) is supported for the encoding of Korean characters.
Be aware that the size of the conversion table is larger than that for Japanese or Chinese.
The conversion target to be supported can be restricted to either of the following using the conversion table strip described below.
In either case, the size of the converted table is about the same for Japanese and Chinese.

KS X 1001:1992 only
Range of codes over which Chinese characters are eliminated from UHC

In the case of the former, the KS X 1001:1992 character code set is the conversion target. Hangul consists of 2350 characters.
In the case of the latter, all Hangul can be supported rather than excluding Chinese characters as conversion targets.

Conversion of the private area is not supported in both directions.

Chinese Character Encoding

With Chinese character encoding, conversion of characters not found in GB2312-80 is performed according to the internal fonts of the console.
For a listing, click here.

Conversion of the private area is not supported in both directions.

Stripping the Conversion Table

If it is not necessary to use some of the relatively large conversion tables, it is possible to strip the conversion tables by defining a macro within the program.
If you try to convert between the internal character encoding and one of the character encodings whose conversion table has been stripped, an error results. For information on the error code used in this case, see the manual entry for each function.

The currently supported macros are shown below.

Macros	Character Encoding
ENC_STRIP_TABLE_JP	ISO-2022-JP Shift_JIS
ENC_STRIP_TABLE_KR_KANJI	UHC Chinese character region
ENC_STRIP_TABLE_KR_UHC	UHC extended Hangul region
ENC_STRIP_TABLE_KR	UHC
ENC_STRIP_TABLE_CN	GB2312

Revision History

2008/10/21 Revised the part where a mention of ENC_ERR_NOT_LOADED still remained.
2008/02/21 Added character codes for Korean and Chinese.
2007/02/05 Added a description about stripping conversion tables.
2006/11/14 Revised description of the private area.
2006/10/24 Initial version.

CONFIDENTIAL