intro.html - OpenGrok cross reference for /RvlSDK-3.2/man/en_US/enc/intro.html

<!DOCTYPE HTML PUBLIC "-//W3C//Dtd HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<link rel="stylesheet" type="text/css" href="../CSS/revolution.css" />
<title>ENC API Introduction</title>
</head>

<body>

<h1>ENC API</h1>

<h2>Introduction</h2>
<p>
	ENC API provides character code conversion features.
</p>

<h2>Supported Character Encodings</h2>
<p>
	The internal character encoding of Revolution is UTF-16BE.
</p>
<p>
	ENC API supports bi-directional conversion between the following encodings and the internal character encoding:
</p>
<ul>
	<li>US-ASCII</li>
	<li>UTF-8</li>
	<li>UTF-16BE</li>
	<li>UTF-32BE</li>
	<li>ISO-8859-1</li>
	<li>ISO-8859-2</li>
	<li>ISO-8859-3</li>
	<li>ISO-8859-7</li>
	<li>ISO-8859-10</li>
	<li>ISO-8859-15</li>
	<li>ISO-2022-JP</li>
	<li>Shift_JIS</li>
	<li>UHC</li>
	<li>GB2312</li>
	<li>windows-1252</li>
</ul>
<p>
	ENC API supports a one-way conversion from the following encodings to the internal character encoding:
</p>
<ul>
	<li>UTF-7</li>
	<li>UTF-16</li>
	<li>UTF-16LE</li>
	<li>windows-1250</li>
	<li>windows-1253</li>
	<li>macintosh</li>
	<li>x-mac-ce</li>
	<li>x-mac-greek</li>
	<li>IBM850</li>
	<li>IBM852</li>
</ul>

<h2>Character Encoding Name Matching</h2>
<p>
	ENC API matches the character encoding names based on the following rules:
</p>
<ol>
	<li>Convert all alphabets to lower-case.</li>
	<li>If the name starts in &quot;x-&quot; or &quot;cs,&quot; remove them.</li>
	<li>Remove all non-alphabet and non-numeric characters.</li>
	<li>Compare with the matching strings of individual character encoding names.</li>
</ol>
<p>
	The individual character encoding names and the matching strings are as follows:
</p>
<table border="1" cellpadding="3" cellspacing="0">
  <tbody>
    <tr>
<td bgcolor="#C0C0C0" nowrap>Character encoding name</td>
<td bgcolor="#C0C0C0">Matching strings</td>
    </tr>
    <tr>
      <td nowrap>US-ASCII</td>
      <td>usascii<br>ascii<br>us<br>ansix341968<br>ansix341986<br>cp367<br>ibm367<br>iso646irv1991<br>iso646us<br>isoir6</td>
    </tr>
    <tr>
      <td nowrap>UTF-8</td>
      <td>utf8<br>utf8n<br>unicode11utf8<br>unicode20utf8</td>
    </tr>
    <tr>
      <td nowrap>UTF-16BE</td>
      <td>utf16be<br>ucs2<br>ucs2be<br>unicode11<br>unicode20<br>unicode20utf16<br>unicodeascii<br>unicodelatin1<br>iso10646<br>iso10646j1<br>iso10646ucs2<br>iso10646ucs2be<br>iso10646ucsbasic<br>iso10646unicodelatin1</td>
    </tr>
    <tr>
      <td nowrap>UTF-32BE</td>
      <td>utf32be<br>utf32<br>ucs4<br>ucs4be<br>iso10646ucs4<br>iso10646ucs4be</td>
    </tr>
    <tr>
      <td nowrap>ISO-8859-1</td>
      <td>iso88591<br>latin1<br>l1<br>cp819<br>ibm819<br>isolatin1<br>iso885911987<br>isoir100</td>
    </tr>
    <tr>
      <td nowrap>ISO-8859-2</td>
      <td>iso88592<br>latin2<br>l2<br>isolatin2<br>iso885921987<br>isoir101</td>
    </tr>
    <tr>
      <td nowrap>ISO-8859-3</td>
      <td>iso88593<br>latin3<br>l3<br>isolatin3<br>iso885931988<br>isoir109</td>
    </tr>
    <tr>
      <td nowrap>ISO-8859-7</td>
      <td>iso88597<br>greek<br>greek8<br>isolatingreek<br>iso885971987<br>isoir126<br>ecma118<br>elot928<br>suneugreek</td>
    </tr>
    <tr>
      <td nowrap>ISO-8859-10</td>
      <td>iso885910<br>latin6<br>isolatin6<br>l6</td>
    </tr>
    <tr>
      <td nowrap>ISO-8859-15</td>
      <td>iso885915<br>latin9<br>iso8859101992<br>isoir157</td>
    </tr>
    <tr>
      <td nowrap>ISO-2022-JP</td>
      <td>iso2022jp<br>iso2022jp1<br>iso2022jp2</td>
    </tr>
    <tr>
      <td nowrap>Shift_JIS</td>
      <td>shiftjis<br>sjis<br>mscp932<br>mskanji <br>windows31j</td>
    </tr>
    <tr>
      <td nowrap>UHC</td>
      <td>euckr<br>ksc56011987<br>isoir149<br>ksc56011989<br>ksc5601<br>korean<br>uhc<br>cp949<br>windows949</td>
    </tr>
    <tr>
      <td nowrap>GB2312</td>
      <td>gb2312<br>gb231280<br>isoir58<br>chinese<br>iso58gb231280<br>euccn</td>
    </tr>
    <tr>
      <td nowrap>windows-1252</td>
      <td>windows1252<br>cp1252<br>windows30latin1<br>windows31latin1<br>iso88591windows30latin1<br>iso88591windows31latin1</td>
    </tr>
    <tr>
      <td bgcolor="#C0C0C0" nowrap>Character encoding name</td>
      <td bgcolor="#C0C0C0">Matching strings</td>
    </tr>
    <tr>
      <td nowrap>UTF-7</td>
      <td>utf7<br>unicode11utf7<br>unicode20utf7<br>cp65000</td>
    </tr>
    <tr>
      <td nowrap>UTF-16</td>
      <td>utf16<br>cp1200<br>ibm1200</td>
    </tr>
    <tr>
      <td nowrap>UTF-16LE</td>
      <td>utf16le<br>ucs2le<br>iso10646ucs2le</td>
    </tr>
    <tr>
      <td nowrap>windows-1250</td>
      <td>windows1250<br>cp1250<br>windows31latin2<br>iso88592windowslatin2</td>
    </tr>
    <tr>
      <td nowrap>windows-1253</td>
      <td>windows1253<br>cp1253</td>
    </tr>
    <tr>
      <td nowrap>macintosh</td>
      <td>macintosh<br>mac<br>macroman</td>
    </tr>
    <tr>
      <td nowrap>x-mac-ce</td>
      <td>macce</td>
    <tr>
      <td nowrap>x-mac-greek</td>
      <td>macgreek</td>
    </tr>
    <tr>
      <td nowrap>IBM850</td>
      <td>ibm850<br>cp850<br>850<br>pc850multilingual</td>
    </tr>
    <tr>
      <td nowrap>IBM852</td>
      <td>ibm852<br>cp852<br>852<br>pcp852</td>
    </tr>
  </tbody>
</table>

<h2>Conversion Rules of Individual Encodings</h2>
<h3>ISO-8859</h3>
<p>
	ISO-8859 conversion involves conversion from ISO-8859 to the internal character encoding and its reverse conversion.<br>However, in ISO-8859-1 conversion, the encoding will be treated as windows-1252 for conversion to the internal encoding, but treated as ISO-8859-1 for conversion from internal encoding.
</p>
<h3>Japanese Character Encoding</h3>
<p>
	The ISO-2022-JP and Shift_JIS conversion rules will be compatible with Windows conversion with certain exceptions.
</p>
<p>
	When converting from the internal character encoding to ISO-2022-JP or Shift_JIS, the following conversions not in Windows will be used.
</p>
<table border="1" cellpadding="3" cellspacing="0">
  <tbody>
    <tr>
      <td bgcolor="#C0C0C0">Internal Character Encoding</td>
      <td bgcolor="#C0C0C0">ISO-2022-JP</td>
      <td bgcolor="#C0C0C0">Shift_JIS</td>
    </tr>
    <tr>
      <td>0x203E</td>
      <td>0x7E</td>
      <td>0x7E</td>
    </tr>
    <tr>
      <td>0x2014</td>
      <td>0x213D</td>
      <td>0x815C</td>
    </tr>
    <tr>
      <td>0x2016</td>
      <td>0x2142</td>
      <td>0x8161</td>
    </tr>
    <tr>
      <td>0x2212</td>
      <td>0x215D</td>
      <td>0x817C</td>
    </tr>
    <tr>
      <td>0x301C</td>
      <td>0x2141</td>
      <td>0x8160</td>
    </tr>
  </tbody>
</table>
<p>
	ISO-2022-JP supports the following character groups.
</p>
<ul>
	<li>ASCII</li>
	<li>JIS romaji</li>
	<li>JIS X 0208-1983</li>
	<li>Half-width kana</li>
</ul>
<p>
	However, JIS romaji supports only one-way conversion from ISO-2022-JP, and is treated the same as ASCII.<br>Half-width kana also only supports one-way conversion from ISO-2022-JP, and is converted to full-width kana at conversion to ISO-2022-JP.
</p>
<p>
	For internal character encoding, the private area (1880 characters) for ISO-2022-JP and Shift_JIS are defined in the ranges shown below, corresponding to the code order.<br>Conversion from  ISO-2022-JP is possible in the private area, but conversion to ISO-2022-JP returns <CODE>ENC_ERR_NO_MAP_RULE</CODE>.
</p>

<table border="1" cellpadding="3" cellspacing="0">
  <tbody>
    <tr>
      <td bgcolor="#C0C0C0">Internal Character Encoding</td>
      <td bgcolor="#C0C0C0">ISO-2022-JP</td>
      <td bgcolor="#C0C0C0">Shift_JIS</td>
    </tr>
    <tr>
<td>0xE000 ~ 0xE757E</td>
<td>0x7F21 ~ 0x927E</td>
<td>0xF040 ~ 0xF9FC</td>
    </tr>
  </tbody>
</table>

<h3>Korean Character Encoding</h3>
<p>
	UHC (CP949) is supported for the encoding of Korean characters.<br />Be aware that the size of the conversion table is larger than that for Japanese or Chinese.<br />The conversion target to be supported can be restricted to either of the following using the conversion table strip described below.<br />In either case, the size of the converted table is about the same for Japanese and Chinese.
</p>
<ul>
	<li>KS X 1001:1992 only</li>
	<li>Range of codes over which Chinese characters are eliminated from UHC</li>
</ul>
<p>
	In the case of the former, the KS X 1001:1992 character code set is the conversion target. Hangul consists of 2350 characters.<br />In the case of the latter, all Hangul can be supported rather than excluding Chinese characters as conversion targets.<br />
</p>
<p>
	Conversion of the private area is not supported in both directions.
</p>

<h3>Chinese Character Encoding</h3>
<p>
	With Chinese character encoding, conversion of characters not found in GB2312-80 is performed according to the internal fonts of the console.<br />For a listing, click <a href="./chineseextbl.html">here</a>.<br>
</p>
<p>
	Conversion of the private area is not supported in both directions.
</p>

<h2>Stripping the Conversion Table</h2>
<p>
	If it is not necessary to use some of the relatively large conversion tables, it is possible to strip the conversion tables by defining a macro within the program.<br>If you try to convert between the internal character encoding and one of the character encodings whose conversion table has been stripped, <code>ENC_ERR_NOT_LOADED</code> will be returned.
</p>
<p>
	The currently supported macros are shown below.
</p>
<table border="1" cellpadding="3" cellspacing="0">
  <tbody>
    <tr>
<td bgcolor="#C0C0C0">macros</td>
<td bgcolor="#C0C0C0">Character encoding</td>
    </tr>
    <tr>
<td>ENC_STRIP_TABLE_JP</td>
<td>ISO-2022-JP<br>Shift_JIS</td>
    </tr>
    <tr>
<td>ENC_STRIP_TABLE_KR_KANJI</td>
<td>UHC Chinese character region</td>
    </tr>
    <tr>
<td>ENC_STRIP_TABLE_KR_UHC</td>
<td>UHC extended Hangul region</td>
    </tr>
    <tr>
<td>ENC_STRIP_TABLE_KR</td>
<td>UHC</td>
    </tr>
    <tr>
<td>ENC_STRIP_TABLE_CN</td>
<td>GB2312</td>
    </tr>
  </tbody>
</table>

<h2>See Also</h2>
<p><a href="chineseextbl.html">List of Additional Conversion Rules for Chinese Character Encoding</a></p>

<h2>Revision History</h2>
<p>
2008/02/21 Added character codes for Korean and Chinese.<br>2007/02/05 Added a description about stripping conversion tables.<br>2006/11/14 Revised description of the private area.<br>2006/10/24 Initial version.<br>
</p>

<hr><p>CONFIDENTIAL</p></body>
</html>