intro.html - OpenGrok cross reference for /RvlSDK-3.2.3/man/en_US/enc/intro.html

<!DOCTYPE HTML PUBLIC "-//W3C//Dtd HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<link rel="stylesheet" type="text/css" href="../CSS/revolution.css" />
<title>ENC API Introduction</title>
</head>

<body>

<h1>ENC API</h1>

<h2>Introduction</h2>
<p>
	ENC API provides character code conversion features.
</p>

<h2>Supported Character Encodings</h2>
<p>
	The internal character encoding of Revolution is UTF-16BE.
</p>
<p>
	ENC API supports bi-directional conversion between the following encodings and the internal character encoding.
</p>
<ul>
	<li>US-ASCII</li>
	<li>UTF-8</li>
	<li>UTF-16BE</li>
	<li>UTF-32BE</li>
	<li>ISO-8859-1</li>
	<li>ISO-8859-2</li>
	<li>ISO-8859-3</li>
	<li>ISO-8859-7</li>
	<li>ISO-8859-10</li>
	<li>ISO-8859-15</li>
	<li>ISO-2022-JP</li>
	<li>Shift_JIS</li>
	<li>UHC</li>
	<li>GB2312</li>
	<li>windows-1252</li>
</ul>
<p>
	ENC API supports a one-way conversion from the following encodings to the internal character encoding.
</p>
<ul>
	<li>UTF-7</li>
	<li>UTF-16</li>
	<li>UTF-16LE</li>
	<li>windows-1250</li>
	<li>windows-1253</li>
	<li>macintosh</li>
	<li>x-mac-ce</li>
	<li>x-mac-greek</li>
	<li>IBM850</li>
	<li>IBM852</li>
</ul>

<h2>Character Encoding Name Matching</h2>
<p>
	ENC API matches the character encoding names based on the following rules.
</p>
<ol>
	<li>Convert all alphabets to lowercase.</li>
	<li>If the name starts in &quot;x-&quot; or &quot;cs,&quot; remove them.</li>
	<li>Remove all non-alphabet and non-numeric characters.</li>
	<li>Compare with the matching strings of individual character encoding names.</li>
</ol>
<p>
	The individual character encoding names and the matching strings are as follows.
</p>
<table border="1" cellpadding="3" cellspacing="0">
  <tbody>
    <tr>
<td bgcolor="#C0C0C0" nowrap>Character Encoding Name</td>
<td bgcolor="#C0C0C0">Matching Strings</td>
    </tr>
    <tr>
      <td nowrap>US-ASCII</td>
      <td>usascii<br>ascii<br>us<br>ansix341968<br>ansix341986<br>cp367<br>ibm367<br>iso646irv1991<br>iso646us<br>isoir6</td>
    </tr>
    <tr>
      <td nowrap>UTF-8</td>
      <td>utf8<br>utf8n<br>unicode11utf8<br>unicode20utf8</td>
    </tr>
    <tr>
      <td nowrap>UTF-16BE</td>
      <td>utf16be<br>ucs2<br>ucs2be<br>unicode11<br>unicode20<br>unicode20utf16<br>unicodeascii<br>unicodelatin1<br>iso10646<br>iso10646j1<br>iso10646ucs2<br>iso10646ucs2be<br>iso10646ucsbasic<br>iso10646unicodelatin1</td>
    </tr>
    <tr>
      <td nowrap>UTF-32BE</td>
      <td>utf32be<br>utf32<br>ucs4<br>ucs4be<br>iso10646ucs4<br>iso10646ucs4be</td>
    </tr>
    <tr>
      <td nowrap>ISO-8859-1</td>
      <td>iso88591<br>latin1<br>l1<br>cp819<br>ibm819<br>isolatin1<br>iso885911987<br>isoir100</td>
    </tr>
    <tr>
      <td nowrap>ISO-8859-2</td>
      <td>iso88592<br>latin2<br>l2<br>isolatin2<br>iso885921987<br>isoir101</td>
    </tr>
    <tr>
      <td nowrap>ISO-8859-3</td>
      <td>iso88593<br>latin3<br>l3<br>isolatin3<br>iso885931988<br>isoir109</td>
    </tr>
    <tr>
      <td nowrap>ISO-8859-7</td>
      <td>iso88597<br>greek<br>greek8<br>isolatingreek<br>iso885971987<br>isoir126<br>ecma118<br>elot928<br>suneugreek</td>
    </tr>
    <tr>
      <td nowrap>ISO-8859-10</td>
      <td>iso885910<br>latin6<br>isolatin6<br>l6</td>
    </tr>
    <tr>
      <td nowrap>ISO-8859-15</td>
      <td>iso885915<br>latin9<br>iso8859101992<br>isoir157</td>
    </tr>
    <tr>
      <td nowrap>ISO-2022-JP</td>
      <td>iso2022jp<br>iso2022jp1<br>iso2022jp2</td>
    </tr>
    <tr>
      <td nowrap>Shift_JIS</td>
      <td>shiftjis<br>sjis<br>mscp932<br>mskanji <br>windows31j</td>
    </tr>
    <tr>
      <td nowrap>UHC</td>
      <td>euckr<br>ksc56011987<br>isoir149<br>ksc56011989<br>ksc5601<br>korean<br>uhc<br>cp949<br>windows949</td>
    </tr>
    <tr>
      <td nowrap>GB2312</td>
      <td>gb2312<br>gb231280<br>isoir58<br>chinese<br>iso58gb231280<br>euccn</td>
    </tr>
    <tr>
      <td nowrap>windows-1252</td>
      <td>windows1252<br>cp1252<br>windows30latin1<br>windows31latin1<br>iso88591windows30latin1<br>iso88591windows31latin1</td>
    </tr>
    <tr>
<td bgcolor="#C0C0C0" nowrap>Character Encoding Name</td>
<td bgcolor="#C0C0C0">Matching Strings</td>
    </tr>
    <tr>
      <td nowrap>UTF-7</td>
      <td>utf7<br>unicode11utf7<br>unicode20utf7<br>cp65000</td>
    </tr>
    <tr>
      <td nowrap>UTF-16</td>
      <td>utf16<br>cp1200<br>ibm1200</td>
    </tr>
    <tr>
      <td nowrap>UTF-16LE</td>
      <td>utf16le<br>ucs2le<br>iso10646ucs2le</td>
    </tr>
    <tr>
      <td nowrap>windows-1250</td>
      <td>windows1250<br>cp1250<br>windows31latin2<br>iso88592windowslatin2</td>
    </tr>
    <tr>
      <td nowrap>windows-1253</td>
      <td>windows1253<br>cp1253</td>
    </tr>
    <tr>
      <td nowrap>macintosh</td>
      <td>macintosh<br>mac<br>macroman</td>
    </tr>
    <tr>
      <td nowrap>x-mac-ce</td>
      <td>macce</td>
    <tr>
      <td nowrap>x-mac-greek</td>
      <td>macgreek</td>
    </tr>
    <tr>
      <td nowrap>IBM850</td>
      <td>ibm850<br>cp850<br>850<br>pc850multilingual</td>
    </tr>
    <tr>
      <td nowrap>IBM852</td>
      <td>ibm852<br>cp852<br>852<br>pcp852</td>
    </tr>
  </tbody>
</table>

<h2>Conversion Rules of Individual Encodings</h2>
<h3>ISO-8859</h3>
<p>
	ISO-8859 conversion involves conversion from ISO-8859 to the internal character encoding and its reverse conversion.<br>However, in ISO-8859-1 conversion, the encoding is treated as windows-1252 for conversion to the internal encoding, but treated as ISO-8859-1 for conversion from internal encoding.
</p>
<h3>Japanese Character Encoding</h3>
<p>
	The ISO-2022-JP and Shift_JIS conversion rules are compatible with Windows conversion with certain exceptions.
</p>
<p>
	When converting from the internal character encoding to ISO-2022-JP or Shift_JIS, the following conversions not in Windows are used.
</p>
<table border="1" cellpadding="3" cellspacing="0">
  <tbody>
    <tr>
<td bgcolor="#C0C0C0">Internal Character Encoding</td>
      <td bgcolor="#C0C0C0">ISO-2022-JP</td>
      <td bgcolor="#C0C0C0">Shift_JIS</td>
    </tr>
    <tr>
      <td>0x203E</td>
      <td>0x7E</td>
      <td>0x7E</td>
    </tr>
    <tr>
      <td>0x2014</td>
      <td>0x213D</td>
      <td>0x815C</td>
    </tr>
    <tr>
      <td>0x2016</td>
      <td>0x2142</td>
      <td>0x8161</td>
    </tr>
    <tr>
      <td>0x2212</td>
      <td>0x215D</td>
      <td>0x817C</td>
    </tr>
    <tr>
      <td>0x301C</td>
      <td>0x2141</td>
      <td>0x8160</td>
    </tr>
  </tbody>
</table>
<p>
	ISO-2022-JP supports the following character groups.
</p>
<ul>
	<li>ASCII</li>
	<li>JIS romaji</li>
	<li>JIS X 0208-1983</li>
	<li>Half-width kana</li>
</ul>
<p>
	However, JIS romaji supports only one-way conversion from ISO-2022-JP and is treated the same as ASCII.<br>Half-width kana also only supports one-way conversion from ISO-2022-JP and is converted to full-width kana at conversion to ISO-2022-JP.
</p>
<p>
	For internal character encoding, the private area (1880 characters) for ISO-2022-JP and Shift_JIS are defined in the ranges shown below, corresponding to the code order.<br>Conversion from  ISO-2022-JP is possible in the private area, but conversion to ISO-2022-JP returns <CODE>ENC_ERR_NO_MAP_RULE</CODE>.
</p>

<table border="1" cellpadding="3" cellspacing="0">
  <tbody>
    <tr>
<td bgcolor="#C0C0C0">Internal Character Encoding</td>
<td bgcolor="#C0C0C0">ISO-2022-JP</td>
<td bgcolor="#C0C0C0">Shift_JIS</td>
    </tr>
    <tr>
<td>0xE000 - 0xE757E</td>
<td>0x7F21 - 0x927E</td>
<td>0xF040 - 0xF9FC</td>
    </tr>
  </tbody>
</table>

<h3>Korean Character Encoding</h3>
<p>
	UHC (CP949) is supported for the encoding of Korean characters.<br />Be aware that the size of the conversion table is larger than that for Japanese or Chinese.<br />The conversion target to be supported can be restricted to either of the following using the conversion table strip described below.<br />In either case, the size of the converted table is about the same for Japanese and Chinese.
</p>
<ul>
	<li>KS X 1001:1992 only</li>
	<li>Range of codes over which Chinese characters are eliminated from UHC</li>
</ul>
<p>
	In the case of the former, the KS X 1001:1992 character code set is the conversion target. Hangul consists of 2350 characters.<br />In the case of the latter, all Hangul can be supported rather than excluding Chinese characters as conversion targets.<br />
</p>
<p>
	Conversion of the private area is not supported in both directions.
</p>

<h3>Chinese Character Encoding</h3>
<p>
	With Chinese character encoding, conversion of characters not found in GB2312-80 is performed according to the internal fonts of the console.<br />For a listing, click <a href="./chineseextbl.html">here</a>.<br>
</p>
<p>
	Conversion of the private area is not supported in both directions.
</p>

<h2>Stripping the Conversion Table</h2>
<p>
	If it is not necessary to use some of the relatively large conversion tables, it is possible to strip the conversion tables by defining a macro within the program.<br>If you try to convert between the internal character encoding and one of the character encodings whose conversion table has been stripped, an error results. For information on the error code used in this case, see the manual entry for each function.
</p>
<p>
	The currently supported macros are shown below.
</p>
<table border="1" cellpadding="3" cellspacing="0">
  <tbody>
    <tr>
<td bgcolor="#C0C0C0">Macros</td>
<td bgcolor="#C0C0C0">Character Encoding</td>
    </tr>
    <tr>
      <td>ENC_STRIP_TABLE_JP</td>
      <td>ISO-2022-JP<br>Shift_JIS</td>
    </tr>
    <tr>
<td>ENC_STRIP_TABLE_KR_KANJI</td>
<td>UHC Chinese character region</td>
    </tr>
    <tr>
<td>ENC_STRIP_TABLE_KR_UHC</td>
<td>UHC extended Hangul region</td>
    </tr>
    <tr>
      <td>ENC_STRIP_TABLE_KR</td>
      <td>UHC</td>
    </tr>
    <tr>
      <td>ENC_STRIP_TABLE_CN</td>
      <td>GB2312</td>
    </tr>
  </tbody>
</table>

<h2>See Also</h2>
<p><a href="chineseextbl.html">List of Additional Conversion Rules for Chinese Character Encoding</a></p>

<h2>Revision History</h2>
<p>
2008/10/21 Revised the part where a mention of <CODE>ENC_ERR_NOT_LOADED</CODE> still remained.<br>2008/02/21 Added character codes for Korean and Chinese.<br>2007/02/05 Added a description about stripping conversion tables.<br>2006/11/14 Revised description of the private area.<br>2006/10/24 Initial version.
</p>

<hr><p>CONFIDENTIAL</p></body>
</html>