1<!DOCTYPE HTML PUBLIC "-//W3C//Dtd HTML 4.01 Transitional//EN"> 2<html> 3<head> 4<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" /> 5<meta http-equiv="Content-Style-Type" content="text/css" /> 6<link rel="stylesheet" type="text/css" href="../CSS/revolution.css" /> 7<title>ENC API Introduction</title> 8</head> 9 10<body> 11 12<h1>ENC API</h1> 13 14<h2>Introduction</h2> 15<p> 16 ENC API provides character code conversion features. 17</p> 18 19<h2>Supported Character Encodings</h2> 20<p> 21 The internal character encoding of Revolution is UTF-16BE. 22</p> 23<p> 24 ENC API supports bi-directional conversion between the following encodings and the internal character encoding: 25</p> 26<ul> 27 <li>US-ASCII</li> 28 <li>UTF-8</li> 29 <li>UTF-16BE</li> 30 <li>UTF-32BE</li> 31 <li>ISO-8859-1</li> 32 <li>ISO-8859-2</li> 33 <li>ISO-8859-3</li> 34 <li>ISO-8859-7</li> 35 <li>ISO-8859-10</li> 36 <li>ISO-8859-15</li> 37 <li>ISO-2022-JP</li> 38 <li>Shift_JIS</li> 39 <li>UHC</li> 40 <li>GB2312</li> 41 <li>windows-1252</li> 42</ul> 43<p> 44 ENC API supports a one-way conversion from the following encodings to the internal character encoding: 45</p> 46<ul> 47 <li>UTF-7</li> 48 <li>UTF-16</li> 49 <li>UTF-16LE</li> 50 <li>windows-1250</li> 51 <li>windows-1253</li> 52 <li>macintosh</li> 53 <li>x-mac-ce</li> 54 <li>x-mac-greek</li> 55 <li>IBM850</li> 56 <li>IBM852</li> 57</ul> 58 59<h2>Character Encoding Name Matching</h2> 60<p> 61 ENC API matches the character encoding names based on the following rules: 62</p> 63<ol> 64 <li>Convert all alphabets to lower-case.</li> 65 <li>If the name starts in "x-" or "cs," remove them.</li> 66 <li>Remove all non-alphabet and non-numeric characters.</li> 67 <li>Compare with the matching strings of individual character encoding names.</li> 68</ol> 69<p> 70 The individual character encoding names and the matching strings are as follows: 71</p> 72<table border="1" cellpadding="3" cellspacing="0"> 73 <tbody> 74 <tr> 75<td bgcolor="#C0C0C0" nowrap>Character encoding name</td> 76<td bgcolor="#C0C0C0">Matching strings</td> 77 </tr> 78 <tr> 79 <td nowrap>US-ASCII</td> 80 <td>usascii<br>ascii<br>us<br>ansix341968<br>ansix341986<br>cp367<br>ibm367<br>iso646irv1991<br>iso646us<br>isoir6</td> 81 </tr> 82 <tr> 83 <td nowrap>UTF-8</td> 84 <td>utf8<br>utf8n<br>unicode11utf8<br>unicode20utf8</td> 85 </tr> 86 <tr> 87 <td nowrap>UTF-16BE</td> 88 <td>utf16be<br>ucs2<br>ucs2be<br>unicode11<br>unicode20<br>unicode20utf16<br>unicodeascii<br>unicodelatin1<br>iso10646<br>iso10646j1<br>iso10646ucs2<br>iso10646ucs2be<br>iso10646ucsbasic<br>iso10646unicodelatin1</td> 89 </tr> 90 <tr> 91 <td nowrap>UTF-32BE</td> 92 <td>utf32be<br>utf32<br>ucs4<br>ucs4be<br>iso10646ucs4<br>iso10646ucs4be</td> 93 </tr> 94 <tr> 95 <td nowrap>ISO-8859-1</td> 96 <td>iso88591<br>latin1<br>l1<br>cp819<br>ibm819<br>isolatin1<br>iso885911987<br>isoir100</td> 97 </tr> 98 <tr> 99 <td nowrap>ISO-8859-2</td> 100 <td>iso88592<br>latin2<br>l2<br>isolatin2<br>iso885921987<br>isoir101</td> 101 </tr> 102 <tr> 103 <td nowrap>ISO-8859-3</td> 104 <td>iso88593<br>latin3<br>l3<br>isolatin3<br>iso885931988<br>isoir109</td> 105 </tr> 106 <tr> 107 <td nowrap>ISO-8859-7</td> 108 <td>iso88597<br>greek<br>greek8<br>isolatingreek<br>iso885971987<br>isoir126<br>ecma118<br>elot928<br>suneugreek</td> 109 </tr> 110 <tr> 111 <td nowrap>ISO-8859-10</td> 112 <td>iso885910<br>latin6<br>isolatin6<br>l6</td> 113 </tr> 114 <tr> 115 <td nowrap>ISO-8859-15</td> 116 <td>iso885915<br>latin9<br>iso8859101992<br>isoir157</td> 117 </tr> 118 <tr> 119 <td nowrap>ISO-2022-JP</td> 120 <td>iso2022jp<br>iso2022jp1<br>iso2022jp2</td> 121 </tr> 122 <tr> 123 <td nowrap>Shift_JIS</td> 124 <td>shiftjis<br>sjis<br>mscp932<br>mskanji <br>windows31j</td> 125 </tr> 126 <tr> 127 <td nowrap>UHC</td> 128 <td>euckr<br>ksc56011987<br>isoir149<br>ksc56011989<br>ksc5601<br>korean<br>uhc<br>cp949<br>windows949</td> 129 </tr> 130 <tr> 131 <td nowrap>GB2312</td> 132 <td>gb2312<br>gb231280<br>isoir58<br>chinese<br>iso58gb231280<br>euccn</td> 133 </tr> 134 <tr> 135 <td nowrap>windows-1252</td> 136 <td>windows1252<br>cp1252<br>windows30latin1<br>windows31latin1<br>iso88591windows30latin1<br>iso88591windows31latin1</td> 137 </tr> 138 <tr> 139 <td bgcolor="#C0C0C0" nowrap>Character encoding name</td> 140 <td bgcolor="#C0C0C0">Matching strings</td> 141 </tr> 142 <tr> 143 <td nowrap>UTF-7</td> 144 <td>utf7<br>unicode11utf7<br>unicode20utf7<br>cp65000</td> 145 </tr> 146 <tr> 147 <td nowrap>UTF-16</td> 148 <td>utf16<br>cp1200<br>ibm1200</td> 149 </tr> 150 <tr> 151 <td nowrap>UTF-16LE</td> 152 <td>utf16le<br>ucs2le<br>iso10646ucs2le</td> 153 </tr> 154 <tr> 155 <td nowrap>windows-1250</td> 156 <td>windows1250<br>cp1250<br>windows31latin2<br>iso88592windowslatin2</td> 157 </tr> 158 <tr> 159 <td nowrap>windows-1253</td> 160 <td>windows1253<br>cp1253</td> 161 </tr> 162 <tr> 163 <td nowrap>macintosh</td> 164 <td>macintosh<br>mac<br>macroman</td> 165 </tr> 166 <tr> 167 <td nowrap>x-mac-ce</td> 168 <td>macce</td> 169 <tr> 170 <td nowrap>x-mac-greek</td> 171 <td>macgreek</td> 172 </tr> 173 <tr> 174 <td nowrap>IBM850</td> 175 <td>ibm850<br>cp850<br>850<br>pc850multilingual</td> 176 </tr> 177 <tr> 178 <td nowrap>IBM852</td> 179 <td>ibm852<br>cp852<br>852<br>pcp852</td> 180 </tr> 181 </tbody> 182</table> 183 184<h2>Conversion Rules of Individual Encodings</h2> 185<h3>ISO-8859</h3> 186<p> 187 ISO-8859 conversion involves conversion from ISO-8859 to the internal character encoding and its reverse conversion.<br>However, in ISO-8859-1 conversion, the encoding will be treated as windows-1252 for conversion to the internal encoding, but treated as ISO-8859-1 for conversion from internal encoding. 188</p> 189<h3>Japanese Character Encoding</h3> 190<p> 191 The ISO-2022-JP and Shift_JIS conversion rules will be compatible with Windows conversion with certain exceptions. 192</p> 193<p> 194 When converting from the internal character encoding to ISO-2022-JP or Shift_JIS, the following conversions not in Windows will be used. 195</p> 196<table border="1" cellpadding="3" cellspacing="0"> 197 <tbody> 198 <tr> 199 <td bgcolor="#C0C0C0">Internal Character Encoding</td> 200 <td bgcolor="#C0C0C0">ISO-2022-JP</td> 201 <td bgcolor="#C0C0C0">Shift_JIS</td> 202 </tr> 203 <tr> 204 <td>0x203E</td> 205 <td>0x7E</td> 206 <td>0x7E</td> 207 </tr> 208 <tr> 209 <td>0x2014</td> 210 <td>0x213D</td> 211 <td>0x815C</td> 212 </tr> 213 <tr> 214 <td>0x2016</td> 215 <td>0x2142</td> 216 <td>0x8161</td> 217 </tr> 218 <tr> 219 <td>0x2212</td> 220 <td>0x215D</td> 221 <td>0x817C</td> 222 </tr> 223 <tr> 224 <td>0x301C</td> 225 <td>0x2141</td> 226 <td>0x8160</td> 227 </tr> 228 </tbody> 229</table> 230<p> 231 ISO-2022-JP supports the following character groups. 232</p> 233<ul> 234 <li>ASCII</li> 235 <li>JIS romaji</li> 236 <li>JIS X 0208-1983</li> 237 <li>Half-width kana</li> 238</ul> 239<p> 240 However, JIS romaji supports only one-way conversion from ISO-2022-JP, and is treated the same as ASCII.<br>Half-width kana also only supports one-way conversion from ISO-2022-JP, and is converted to full-width kana at conversion to ISO-2022-JP. 241</p> 242<p> 243 For internal character encoding, the private area (1880 characters) for ISO-2022-JP and Shift_JIS are defined in the ranges shown below, corresponding to the code order.<br>Conversion from ISO-2022-JP is possible in the private area, but conversion to ISO-2022-JP returns <CODE>ENC_ERR_NO_MAP_RULE</CODE>. 244</p> 245 246<table border="1" cellpadding="3" cellspacing="0"> 247 <tbody> 248 <tr> 249 <td bgcolor="#C0C0C0">Internal Character Encoding</td> 250 <td bgcolor="#C0C0C0">ISO-2022-JP</td> 251 <td bgcolor="#C0C0C0">Shift_JIS</td> 252 </tr> 253 <tr> 254<td>0xE000 ~ 0xE757E</td> 255<td>0x7F21 ~ 0x927E</td> 256<td>0xF040 ~ 0xF9FC</td> 257 </tr> 258 </tbody> 259</table> 260 261<h3>Korean Character Encoding</h3> 262<p> 263 UHC (CP949) is supported for the encoding of Korean characters.<br />Be aware that the size of the conversion table is larger than that for Japanese or Chinese.<br />The conversion target to be supported can be restricted to either of the following using the conversion table strip described below.<br />In either case, the size of the converted table is about the same for Japanese and Chinese. 264</p> 265<ul> 266 <li>KS X 1001:1992 only</li> 267 <li>Range of codes over which Chinese characters are eliminated from UHC</li> 268</ul> 269<p> 270 In the case of the former, the KS X 1001:1992 character code set is the conversion target. Hangul consists of 2350 characters.<br />In the case of the latter, all Hangul can be supported rather than excluding Chinese characters as conversion targets.<br /> 271</p> 272<p> 273 Conversion of the private area is not supported in both directions. 274</p> 275 276<h3>Chinese Character Encoding</h3> 277<p> 278 With Chinese character encoding, conversion of characters not found in GB2312-80 is performed according to the internal fonts of the console.<br />For a listing, click <a href="./chineseextbl.html">here</a>.<br> 279</p> 280<p> 281 Conversion of the private area is not supported in both directions. 282</p> 283 284<h2>Stripping the Conversion Table</h2> 285<p> 286 If it is not necessary to use some of the relatively large conversion tables, it is possible to strip the conversion tables by defining a macro within the program.<br>If you try to convert between the internal character encoding and one of the character encodings whose conversion table has been stripped, <code>ENC_ERR_NOT_LOADED</code> will be returned. 287</p> 288<p> 289 The currently supported macros are shown below. 290</p> 291<table border="1" cellpadding="3" cellspacing="0"> 292 <tbody> 293 <tr> 294<td bgcolor="#C0C0C0">macros</td> 295<td bgcolor="#C0C0C0">Character encoding</td> 296 </tr> 297 <tr> 298<td>ENC_STRIP_TABLE_JP</td> 299<td>ISO-2022-JP<br>Shift_JIS</td> 300 </tr> 301 <tr> 302<td>ENC_STRIP_TABLE_KR_KANJI</td> 303<td>UHC Chinese character region</td> 304 </tr> 305 <tr> 306<td>ENC_STRIP_TABLE_KR_UHC</td> 307<td>UHC extended Hangul region</td> 308 </tr> 309 <tr> 310<td>ENC_STRIP_TABLE_KR</td> 311<td>UHC</td> 312 </tr> 313 <tr> 314<td>ENC_STRIP_TABLE_CN</td> 315<td>GB2312</td> 316 </tr> 317 </tbody> 318</table> 319 320<h2>See Also</h2> 321<p><a href="chineseextbl.html">List of Additional Conversion Rules for Chinese Character Encoding</a></p> 322 323<h2>Revision History</h2> 324<p> 3252008/02/21 Added character codes for Korean and Chinese.<br>2007/02/05 Added a description about stripping conversion tables.<br>2006/11/14 Revised description of the private area.<br>2006/10/24 Initial version.<br> 326</p> 327 328<hr><p>CONFIDENTIAL</p></body> 329</html>