1<!DOCTYPE HTML PUBLIC "-//W3C//Dtd HTML 4.01 Transitional//EN">
2<html>
3<head>
4<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
5<meta http-equiv="Content-Style-Type" content="text/css" />
6<link rel="stylesheet" type="text/css" href="../CSS/revolution.css" />
7<title>ENC API Introduction</title>
8</head>
9
10<body>
11
12<h1>ENC API</h1>
13
14<h2>Introduction</h2>
15<p>
16	ENC API provides character code conversion features.
17</p>
18
19<h2>Supported Character Encodings</h2>
20<p>
21	The internal character encoding of Revolution is UTF-16BE.
22</p>
23<p>
24	ENC API supports bi-directional conversion between the following encodings and the internal character encoding.
25</p>
26<ul>
27	<li>US-ASCII</li>
28	<li>UTF-8</li>
29	<li>UTF-16BE</li>
30	<li>UTF-32BE</li>
31	<li>ISO-8859-1</li>
32	<li>ISO-8859-2</li>
33	<li>ISO-8859-3</li>
34	<li>ISO-8859-7</li>
35	<li>ISO-8859-10</li>
36	<li>ISO-8859-15</li>
37	<li>ISO-2022-JP</li>
38	<li>Shift_JIS</li>
39	<li>UHC</li>
40	<li>GB2312</li>
41	<li>windows-1252</li>
42</ul>
43<p>
44	ENC API supports a one-way conversion from the following encodings to the internal character encoding.
45</p>
46<ul>
47	<li>UTF-7</li>
48	<li>UTF-16</li>
49	<li>UTF-16LE</li>
50	<li>windows-1250</li>
51	<li>windows-1253</li>
52	<li>macintosh</li>
53	<li>x-mac-ce</li>
54	<li>x-mac-greek</li>
55	<li>IBM850</li>
56	<li>IBM852</li>
57</ul>
58
59<h2>Character Encoding Name Matching</h2>
60<p>
61	ENC API matches the character encoding names based on the following rules.
62</p>
63<ol>
64	<li>Convert all alphabets to lowercase.</li>
65	<li>If the name starts in &quot;x-&quot; or &quot;cs,&quot; remove them.</li>
66	<li>Remove all non-alphabet and non-numeric characters.</li>
67	<li>Compare with the matching strings of individual character encoding names.</li>
68</ol>
69<p>
70	The individual character encoding names and the matching strings are as follows.
71</p>
72<table border="1" cellpadding="3" cellspacing="0">
73  <tbody>
74    <tr>
75<td bgcolor="#C0C0C0" nowrap>Character Encoding Name</td>
76<td bgcolor="#C0C0C0">Matching Strings</td>
77    </tr>
78    <tr>
79      <td nowrap>US-ASCII</td>
80      <td>usascii<br>ascii<br>us<br>ansix341968<br>ansix341986<br>cp367<br>ibm367<br>iso646irv1991<br>iso646us<br>isoir6</td>
81    </tr>
82    <tr>
83      <td nowrap>UTF-8</td>
84      <td>utf8<br>utf8n<br>unicode11utf8<br>unicode20utf8</td>
85    </tr>
86    <tr>
87      <td nowrap>UTF-16BE</td>
88      <td>utf16be<br>ucs2<br>ucs2be<br>unicode11<br>unicode20<br>unicode20utf16<br>unicodeascii<br>unicodelatin1<br>iso10646<br>iso10646j1<br>iso10646ucs2<br>iso10646ucs2be<br>iso10646ucsbasic<br>iso10646unicodelatin1</td>
89    </tr>
90    <tr>
91      <td nowrap>UTF-32BE</td>
92      <td>utf32be<br>utf32<br>ucs4<br>ucs4be<br>iso10646ucs4<br>iso10646ucs4be</td>
93    </tr>
94    <tr>
95      <td nowrap>ISO-8859-1</td>
96      <td>iso88591<br>latin1<br>l1<br>cp819<br>ibm819<br>isolatin1<br>iso885911987<br>isoir100</td>
97    </tr>
98    <tr>
99      <td nowrap>ISO-8859-2</td>
100      <td>iso88592<br>latin2<br>l2<br>isolatin2<br>iso885921987<br>isoir101</td>
101    </tr>
102    <tr>
103      <td nowrap>ISO-8859-3</td>
104      <td>iso88593<br>latin3<br>l3<br>isolatin3<br>iso885931988<br>isoir109</td>
105    </tr>
106    <tr>
107      <td nowrap>ISO-8859-7</td>
108      <td>iso88597<br>greek<br>greek8<br>isolatingreek<br>iso885971987<br>isoir126<br>ecma118<br>elot928<br>suneugreek</td>
109    </tr>
110    <tr>
111      <td nowrap>ISO-8859-10</td>
112      <td>iso885910<br>latin6<br>isolatin6<br>l6</td>
113    </tr>
114    <tr>
115      <td nowrap>ISO-8859-15</td>
116      <td>iso885915<br>latin9<br>iso8859101992<br>isoir157</td>
117    </tr>
118    <tr>
119      <td nowrap>ISO-2022-JP</td>
120      <td>iso2022jp<br>iso2022jp1<br>iso2022jp2</td>
121    </tr>
122    <tr>
123      <td nowrap>Shift_JIS</td>
124      <td>shiftjis<br>sjis<br>mscp932<br>mskanji <br>windows31j</td>
125    </tr>
126    <tr>
127      <td nowrap>UHC</td>
128      <td>euckr<br>ksc56011987<br>isoir149<br>ksc56011989<br>ksc5601<br>korean<br>uhc<br>cp949<br>windows949</td>
129    </tr>
130    <tr>
131      <td nowrap>GB2312</td>
132      <td>gb2312<br>gb231280<br>isoir58<br>chinese<br>iso58gb231280<br>euccn</td>
133    </tr>
134    <tr>
135      <td nowrap>windows-1252</td>
136      <td>windows1252<br>cp1252<br>windows30latin1<br>windows31latin1<br>iso88591windows30latin1<br>iso88591windows31latin1</td>
137    </tr>
138    <tr>
139<td bgcolor="#C0C0C0" nowrap>Character Encoding Name</td>
140<td bgcolor="#C0C0C0">Matching Strings</td>
141    </tr>
142    <tr>
143      <td nowrap>UTF-7</td>
144      <td>utf7<br>unicode11utf7<br>unicode20utf7<br>cp65000</td>
145    </tr>
146    <tr>
147      <td nowrap>UTF-16</td>
148      <td>utf16<br>cp1200<br>ibm1200</td>
149    </tr>
150    <tr>
151      <td nowrap>UTF-16LE</td>
152      <td>utf16le<br>ucs2le<br>iso10646ucs2le</td>
153    </tr>
154    <tr>
155      <td nowrap>windows-1250</td>
156      <td>windows1250<br>cp1250<br>windows31latin2<br>iso88592windowslatin2</td>
157    </tr>
158    <tr>
159      <td nowrap>windows-1253</td>
160      <td>windows1253<br>cp1253</td>
161    </tr>
162    <tr>
163      <td nowrap>macintosh</td>
164      <td>macintosh<br>mac<br>macroman</td>
165    </tr>
166    <tr>
167      <td nowrap>x-mac-ce</td>
168      <td>macce</td>
169    <tr>
170      <td nowrap>x-mac-greek</td>
171      <td>macgreek</td>
172    </tr>
173    <tr>
174      <td nowrap>IBM850</td>
175      <td>ibm850<br>cp850<br>850<br>pc850multilingual</td>
176    </tr>
177    <tr>
178      <td nowrap>IBM852</td>
179      <td>ibm852<br>cp852<br>852<br>pcp852</td>
180    </tr>
181  </tbody>
182</table>
183
184<h2>Conversion Rules of Individual Encodings</h2>
185<h3>ISO-8859</h3>
186<p>
187	ISO-8859 conversion involves conversion from ISO-8859 to the internal character encoding and its reverse conversion.<br>However, in ISO-8859-1 conversion, the encoding is treated as windows-1252 for conversion to the internal encoding, but treated as ISO-8859-1 for conversion from internal encoding.
188</p>
189<h3>Japanese Character Encoding</h3>
190<p>
191	The ISO-2022-JP and Shift_JIS conversion rules are compatible with Windows conversion with certain exceptions.
192</p>
193<p>
194	When converting from the internal character encoding to ISO-2022-JP or Shift_JIS, the following conversions not in Windows are used.
195</p>
196<table border="1" cellpadding="3" cellspacing="0">
197  <tbody>
198    <tr>
199<td bgcolor="#C0C0C0">Internal Character Encoding</td>
200      <td bgcolor="#C0C0C0">ISO-2022-JP</td>
201      <td bgcolor="#C0C0C0">Shift_JIS</td>
202    </tr>
203    <tr>
204      <td>0x203E</td>
205      <td>0x7E</td>
206      <td>0x7E</td>
207    </tr>
208    <tr>
209      <td>0x2014</td>
210      <td>0x213D</td>
211      <td>0x815C</td>
212    </tr>
213    <tr>
214      <td>0x2016</td>
215      <td>0x2142</td>
216      <td>0x8161</td>
217    </tr>
218    <tr>
219      <td>0x2212</td>
220      <td>0x215D</td>
221      <td>0x817C</td>
222    </tr>
223    <tr>
224      <td>0x301C</td>
225      <td>0x2141</td>
226      <td>0x8160</td>
227    </tr>
228  </tbody>
229</table>
230<p>
231	ISO-2022-JP supports the following character groups.
232</p>
233<ul>
234	<li>ASCII</li>
235	<li>JIS romaji</li>
236	<li>JIS X 0208-1983</li>
237	<li>Half-width kana</li>
238</ul>
239<p>
240	However, JIS romaji supports only one-way conversion from ISO-2022-JP and is treated the same as ASCII.<br>Half-width kana also only supports one-way conversion from ISO-2022-JP and is converted to full-width kana at conversion to ISO-2022-JP.
241</p>
242<p>
243	For internal character encoding, the private area (1880 characters) for ISO-2022-JP and Shift_JIS are defined in the ranges shown below, corresponding to the code order.<br>Conversion from  ISO-2022-JP is possible in the private area, but conversion to ISO-2022-JP returns <CODE>ENC_ERR_NO_MAP_RULE</CODE>.
244</p>
245
246<table border="1" cellpadding="3" cellspacing="0">
247  <tbody>
248    <tr>
249<td bgcolor="#C0C0C0">Internal Character Encoding</td>
250<td bgcolor="#C0C0C0">ISO-2022-JP</td>
251<td bgcolor="#C0C0C0">Shift_JIS</td>
252    </tr>
253    <tr>
254<td>0xE000 - 0xE757E</td>
255<td>0x7F21 - 0x927E</td>
256<td>0xF040 - 0xF9FC</td>
257    </tr>
258  </tbody>
259</table>
260
261<h3>Korean Character Encoding</h3>
262<p>
263	UHC (CP949) is supported for the encoding of Korean characters.<br />Be aware that the size of the conversion table is larger than that for Japanese or Chinese.<br />The conversion target to be supported can be restricted to either of the following using the conversion table strip described below.<br />In either case, the size of the converted table is about the same for Japanese and Chinese.
264</p>
265<ul>
266	<li>KS X 1001:1992 only</li>
267	<li>Range of codes over which Chinese characters are eliminated from UHC</li>
268</ul>
269<p>
270	In the case of the former, the KS X 1001:1992 character code set is the conversion target. Hangul consists of 2350 characters.<br />In the case of the latter, all Hangul can be supported rather than excluding Chinese characters as conversion targets.<br />
271</p>
272<p>
273	Conversion of the private area is not supported in both directions.
274</p>
275
276<h3>Chinese Character Encoding</h3>
277<p>
278	With Chinese character encoding, conversion of characters not found in GB2312-80 is performed according to the internal fonts of the console.<br />For a listing, click <a href="./chineseextbl.html">here</a>.<br>
279</p>
280<p>
281	Conversion of the private area is not supported in both directions.
282</p>
283
284<h2>Stripping the Conversion Table</h2>
285<p>
286	If it is not necessary to use some of the relatively large conversion tables, it is possible to strip the conversion tables by defining a macro within the program.<br>If you try to convert between the internal character encoding and one of the character encodings whose conversion table has been stripped, an error results. For information on the error code used in this case, see the manual entry for each function.
287</p>
288<p>
289	The currently supported macros are shown below.
290</p>
291<table border="1" cellpadding="3" cellspacing="0">
292  <tbody>
293    <tr>
294<td bgcolor="#C0C0C0">Macros</td>
295<td bgcolor="#C0C0C0">Character Encoding</td>
296    </tr>
297    <tr>
298      <td>ENC_STRIP_TABLE_JP</td>
299      <td>ISO-2022-JP<br>Shift_JIS</td>
300    </tr>
301    <tr>
302<td>ENC_STRIP_TABLE_KR_KANJI</td>
303<td>UHC Chinese character region</td>
304    </tr>
305    <tr>
306<td>ENC_STRIP_TABLE_KR_UHC</td>
307<td>UHC extended Hangul region</td>
308    </tr>
309    <tr>
310      <td>ENC_STRIP_TABLE_KR</td>
311      <td>UHC</td>
312    </tr>
313    <tr>
314      <td>ENC_STRIP_TABLE_CN</td>
315      <td>GB2312</td>
316    </tr>
317  </tbody>
318</table>
319
320<h2>See Also</h2>
321<p><a href="chineseextbl.html">List of Additional Conversion Rules for Chinese Character Encoding</a></p>
322
323<h2>Revision History</h2>
324<p>
3252008/10/21 Revised the part where a mention of <CODE>ENC_ERR_NOT_LOADED</CODE> still remained.<br>2008/02/21 Added character codes for Korean and Chinese.<br>2007/02/05 Added a description about stripping conversion tables.<br>2006/11/14 Revised description of the private area.<br>2006/10/24 Initial version.
326</p>
327
328<hr><p>CONFIDENTIAL</p></body>
329</html>