1<!DOCTYPE HTML PUBLIC "-//W3C//Dtd HTML 4.01 Transitional//EN">
2<html>
3<head>
4<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
5<meta http-equiv="Content-Style-Type" content="text/css" />
6<link rel="stylesheet" type="text/css" href="../CSS/revolution.css" />
7<title>ENC API Introduction</title>
8</head>
9
10<body>
11
12<h1>ENC API</h1>
13
14<h2>Introduction</h2>
15<p>
16	ENC API provides character code conversion features.
17</p>
18
19<h2>Supported Character Encodings</h2>
20<p>
21	The internal character encoding of Revolution is UTF-16BE.
22</p>
23<p>
24	ENC API supports bi-directional conversion between the following encodings and the internal character encoding:
25</p>
26<ul>
27	<li>US-ASCII</li>
28	<li>UTF-8</li>
29	<li>UTF-16BE</li>
30	<li>UTF-32BE</li>
31	<li>ISO-8859-1</li>
32	<li>ISO-8859-2</li>
33	<li>ISO-8859-3</li>
34	<li>ISO-8859-7</li>
35	<li>ISO-8859-10</li>
36	<li>ISO-8859-15</li>
37	<li>ISO-2022-JP</li>
38	<li>Shift_JIS</li>
39	<li>UHC</li>
40	<li>GB2312</li>
41	<li>windows-1252</li>
42</ul>
43<p>
44	ENC API supports a one-way conversion from the following encodings to the internal character encoding:
45</p>
46<ul>
47	<li>UTF-7</li>
48	<li>UTF-16</li>
49	<li>UTF-16LE</li>
50	<li>windows-1250</li>
51	<li>windows-1253</li>
52	<li>macintosh</li>
53	<li>x-mac-ce</li>
54	<li>x-mac-greek</li>
55	<li>IBM850</li>
56	<li>IBM852</li>
57</ul>
58
59<h2>Character Encoding Name Matching</h2>
60<p>
61	ENC API matches the character encoding names based on the following rules:
62</p>
63<ol>
64	<li>Convert all alphabets to lower-case.</li>
65	<li>If the name starts in &quot;x-&quot; or &quot;cs,&quot; remove them.</li>
66	<li>Remove all non-alphabet and non-numeric characters.</li>
67	<li>Compare with the matching strings of individual character encoding names.</li>
68</ol>
69<p>
70	The individual character encoding names and the matching strings are as follows:
71</p>
72<table border="1" cellpadding="3" cellspacing="0">
73  <tbody>
74    <tr>
75<td bgcolor="#C0C0C0" nowrap>Character encoding name</td>
76<td bgcolor="#C0C0C0">Matching strings</td>
77    </tr>
78    <tr>
79      <td nowrap>US-ASCII</td>
80      <td>usascii<br>ascii<br>us<br>ansix341968<br>ansix341986<br>cp367<br>ibm367<br>iso646irv1991<br>iso646us<br>isoir6</td>
81    </tr>
82    <tr>
83      <td nowrap>UTF-8</td>
84      <td>utf8<br>utf8n<br>unicode11utf8<br>unicode20utf8</td>
85    </tr>
86    <tr>
87      <td nowrap>UTF-16BE</td>
88      <td>utf16be<br>ucs2<br>ucs2be<br>unicode11<br>unicode20<br>unicode20utf16<br>unicodeascii<br>unicodelatin1<br>iso10646<br>iso10646j1<br>iso10646ucs2<br>iso10646ucs2be<br>iso10646ucsbasic<br>iso10646unicodelatin1</td>
89    </tr>
90    <tr>
91      <td nowrap>UTF-32BE</td>
92      <td>utf32be<br>utf32<br>ucs4<br>ucs4be<br>iso10646ucs4<br>iso10646ucs4be</td>
93    </tr>
94    <tr>
95      <td nowrap>ISO-8859-1</td>
96      <td>iso88591<br>latin1<br>l1<br>cp819<br>ibm819<br>isolatin1<br>iso885911987<br>isoir100</td>
97    </tr>
98    <tr>
99      <td nowrap>ISO-8859-2</td>
100      <td>iso88592<br>latin2<br>l2<br>isolatin2<br>iso885921987<br>isoir101</td>
101    </tr>
102    <tr>
103      <td nowrap>ISO-8859-3</td>
104      <td>iso88593<br>latin3<br>l3<br>isolatin3<br>iso885931988<br>isoir109</td>
105    </tr>
106    <tr>
107      <td nowrap>ISO-8859-7</td>
108      <td>iso88597<br>greek<br>greek8<br>isolatingreek<br>iso885971987<br>isoir126<br>ecma118<br>elot928<br>suneugreek</td>
109    </tr>
110    <tr>
111      <td nowrap>ISO-8859-10</td>
112      <td>iso885910<br>latin6<br>isolatin6<br>l6</td>
113    </tr>
114    <tr>
115      <td nowrap>ISO-8859-15</td>
116      <td>iso885915<br>latin9<br>iso8859101992<br>isoir157</td>
117    </tr>
118    <tr>
119      <td nowrap>ISO-2022-JP</td>
120      <td>iso2022jp<br>iso2022jp1<br>iso2022jp2</td>
121    </tr>
122    <tr>
123      <td nowrap>Shift_JIS</td>
124      <td>shiftjis<br>sjis<br>mscp932<br>mskanji <br>windows31j</td>
125    </tr>
126    <tr>
127      <td nowrap>UHC</td>
128      <td>euckr<br>ksc56011987<br>isoir149<br>ksc56011989<br>ksc5601<br>korean<br>uhc<br>cp949<br>windows949</td>
129    </tr>
130    <tr>
131      <td nowrap>GB2312</td>
132      <td>gb2312<br>gb231280<br>isoir58<br>chinese<br>iso58gb231280<br>euccn</td>
133    </tr>
134    <tr>
135      <td nowrap>windows-1252</td>
136      <td>windows1252<br>cp1252<br>windows30latin1<br>windows31latin1<br>iso88591windows30latin1<br>iso88591windows31latin1</td>
137    </tr>
138    <tr>
139      <td bgcolor="#C0C0C0" nowrap>Character encoding name</td>
140      <td bgcolor="#C0C0C0">Matching strings</td>
141    </tr>
142    <tr>
143      <td nowrap>UTF-7</td>
144      <td>utf7<br>unicode11utf7<br>unicode20utf7<br>cp65000</td>
145    </tr>
146    <tr>
147      <td nowrap>UTF-16</td>
148      <td>utf16<br>cp1200<br>ibm1200</td>
149    </tr>
150    <tr>
151      <td nowrap>UTF-16LE</td>
152      <td>utf16le<br>ucs2le<br>iso10646ucs2le</td>
153    </tr>
154    <tr>
155      <td nowrap>windows-1250</td>
156      <td>windows1250<br>cp1250<br>windows31latin2<br>iso88592windowslatin2</td>
157    </tr>
158    <tr>
159      <td nowrap>windows-1253</td>
160      <td>windows1253<br>cp1253</td>
161    </tr>
162    <tr>
163      <td nowrap>macintosh</td>
164      <td>macintosh<br>mac<br>macroman</td>
165    </tr>
166    <tr>
167      <td nowrap>x-mac-ce</td>
168      <td>macce</td>
169    <tr>
170      <td nowrap>x-mac-greek</td>
171      <td>macgreek</td>
172    </tr>
173    <tr>
174      <td nowrap>IBM850</td>
175      <td>ibm850<br>cp850<br>850<br>pc850multilingual</td>
176    </tr>
177    <tr>
178      <td nowrap>IBM852</td>
179      <td>ibm852<br>cp852<br>852<br>pcp852</td>
180    </tr>
181  </tbody>
182</table>
183
184<h2>Conversion Rules of Individual Encodings</h2>
185<h3>ISO-8859</h3>
186<p>
187	ISO-8859 conversion involves conversion from ISO-8859 to the internal character encoding and its reverse conversion.<br>However, in ISO-8859-1 conversion, the encoding will be treated as windows-1252 for conversion to the internal encoding, but treated as ISO-8859-1 for conversion from internal encoding.
188</p>
189<h3>Japanese Character Encoding</h3>
190<p>
191	The ISO-2022-JP and Shift_JIS conversion rules will be compatible with Windows conversion with certain exceptions.
192</p>
193<p>
194	When converting from the internal character encoding to ISO-2022-JP or Shift_JIS, the following conversions not in Windows will be used.
195</p>
196<table border="1" cellpadding="3" cellspacing="0">
197  <tbody>
198    <tr>
199      <td bgcolor="#C0C0C0">Internal Character Encoding</td>
200      <td bgcolor="#C0C0C0">ISO-2022-JP</td>
201      <td bgcolor="#C0C0C0">Shift_JIS</td>
202    </tr>
203    <tr>
204      <td>0x203E</td>
205      <td>0x7E</td>
206      <td>0x7E</td>
207    </tr>
208    <tr>
209      <td>0x2014</td>
210      <td>0x213D</td>
211      <td>0x815C</td>
212    </tr>
213    <tr>
214      <td>0x2016</td>
215      <td>0x2142</td>
216      <td>0x8161</td>
217    </tr>
218    <tr>
219      <td>0x2212</td>
220      <td>0x215D</td>
221      <td>0x817C</td>
222    </tr>
223    <tr>
224      <td>0x301C</td>
225      <td>0x2141</td>
226      <td>0x8160</td>
227    </tr>
228  </tbody>
229</table>
230<p>
231	ISO-2022-JP supports the following character groups.
232</p>
233<ul>
234	<li>ASCII</li>
235	<li>JIS romaji</li>
236	<li>JIS X 0208-1983</li>
237	<li>Half-width kana</li>
238</ul>
239<p>
240	However, JIS romaji supports only one-way conversion from ISO-2022-JP, and is treated the same as ASCII.<br>Half-width kana also only supports one-way conversion from ISO-2022-JP, and is converted to full-width kana at conversion to ISO-2022-JP.
241</p>
242<p>
243	For internal character encoding, the private area (1880 characters) for ISO-2022-JP and Shift_JIS are defined in the ranges shown below, corresponding to the code order.<br>Conversion from  ISO-2022-JP is possible in the private area, but conversion to ISO-2022-JP returns <CODE>ENC_ERR_NO_MAP_RULE</CODE>.
244</p>
245
246<table border="1" cellpadding="3" cellspacing="0">
247  <tbody>
248    <tr>
249      <td bgcolor="#C0C0C0">Internal Character Encoding</td>
250      <td bgcolor="#C0C0C0">ISO-2022-JP</td>
251      <td bgcolor="#C0C0C0">Shift_JIS</td>
252    </tr>
253    <tr>
254<td>0xE000 ~ 0xE757E</td>
255<td>0x7F21 ~ 0x927E</td>
256<td>0xF040 ~ 0xF9FC</td>
257    </tr>
258  </tbody>
259</table>
260
261<h3>Korean Character Encoding</h3>
262<p>
263	UHC (CP949) is supported for the encoding of Korean characters.<br />Be aware that the size of the conversion table is larger than that for Japanese or Chinese.<br />The conversion target to be supported can be restricted to either of the following using the conversion table strip described below.<br />In either case, the size of the converted table is about the same for Japanese and Chinese.
264</p>
265<ul>
266	<li>KS X 1001:1992 only</li>
267	<li>Range of codes over which Chinese characters are eliminated from UHC</li>
268</ul>
269<p>
270	In the case of the former, the KS X 1001:1992 character code set is the conversion target. Hangul consists of 2350 characters.<br />In the case of the latter, all Hangul can be supported rather than excluding Chinese characters as conversion targets.<br />
271</p>
272<p>
273	Conversion of the private area is not supported in both directions.
274</p>
275
276<h3>Chinese Character Encoding</h3>
277<p>
278	With Chinese character encoding, conversion of characters not found in GB2312-80 is performed according to the internal fonts of the console.<br />For a listing, click <a href="./chineseextbl.html">here</a>.<br>
279</p>
280<p>
281	Conversion of the private area is not supported in both directions.
282</p>
283
284<h2>Stripping the Conversion Table</h2>
285<p>
286	If it is not necessary to use some of the relatively large conversion tables, it is possible to strip the conversion tables by defining a macro within the program.<br>If you try to convert between the internal character encoding and one of the character encodings whose conversion table has been stripped, <code>ENC_ERR_NOT_LOADED</code> will be returned.
287</p>
288<p>
289	The currently supported macros are shown below.
290</p>
291<table border="1" cellpadding="3" cellspacing="0">
292  <tbody>
293    <tr>
294<td bgcolor="#C0C0C0">macros</td>
295<td bgcolor="#C0C0C0">Character encoding</td>
296    </tr>
297    <tr>
298<td>ENC_STRIP_TABLE_JP</td>
299<td>ISO-2022-JP<br>Shift_JIS</td>
300    </tr>
301    <tr>
302<td>ENC_STRIP_TABLE_KR_KANJI</td>
303<td>UHC Chinese character region</td>
304    </tr>
305    <tr>
306<td>ENC_STRIP_TABLE_KR_UHC</td>
307<td>UHC extended Hangul region</td>
308    </tr>
309    <tr>
310<td>ENC_STRIP_TABLE_KR</td>
311<td>UHC</td>
312    </tr>
313    <tr>
314<td>ENC_STRIP_TABLE_CN</td>
315<td>GB2312</td>
316    </tr>
317  </tbody>
318</table>
319
320<h2>See Also</h2>
321<p><a href="chineseextbl.html">List of Additional Conversion Rules for Chinese Character Encoding</a></p>
322
323<h2>Revision History</h2>
324<p>
3252008/02/21 Added character codes for Korean and Chinese.<br>2007/02/05 Added a description about stripping conversion tables.<br>2006/11/14 Revised description of the private area.<br>2006/10/24 Initial version.<br>
326</p>
327
328<hr><p>CONFIDENTIAL</p></body>
329</html>