Tamil All Character Encoding

Tamil All Character Encoding (TACE16) is a 16-bit Unicode-based character encoding scheme for Tamil language.^[1]^[2]

Keyboard drivers and fonts

The Keyboard driver for this encoding scheme are available in Tamil Virtual University website^[3] for free.^[4] It uses Tamil99 and Tamil Typewriter keyboard layouts, which are approved by Tamil Nadu Government, and maps the input keystrokes to its corresponding characters of TACE16 scheme.^[2] To read the files which are created using TACE16 scheme, the corresponding Unicode Tamil fonts for this encoding scheme are also available in the same website.^[3]^[4] These fonts not only has mapping of glyphs for characters of TACE16 format, but also has mapping of glyphs for the present Unicode encoding for both ASCII and Tamil characters, so that it can provide backward compatibility for reading existing files which are created using present Unicode encoding scheme for Tamil language.

Character set

All characters of this encoding scheme are located in the private use area of the Basic Multilingual Plane of Unicode's Universal Character Set.

Tamil All Character Encoding(TACE16) Character Set
Consonants→ Vowels ↓	E10	E18	E1A	E1F	E20	E21	E22	E23	E24	E25	E26	E27	E28	E29	E2A	E2B	E2C	E2D	E2E	E2F	E30	E31	E32	E33	E34	E35	E36	E37	E38
0	௳	௦	அரைக்கால்	்		க்	ங்	ச்	ஞ்	ட்	ண்	த்	ந்	ப்	ம்	ய்	ர்	ல்	வ்	ழ்	ள்	ற்	ன்	ஜ்	ஶ்	ஷ்	ஸ்	ஹ்	க்ஷ்
1	௴	௧	கால்		அ	க	ங	ச	ஞ	ட	ண	த	ந	ப	ம	ய	ர	ல	வ	ழ	ள	ற	ன	ஜ	ஶ	ஷ	ஸ	ஹ	க்ஷ
2	௵	௨	அரை	ா	ஆ	கா	ஙா	சா	ஞா	டா	ணா	தா	நா	பா	மா	யா	ரா	லா	வா	ழா	ளா	றா	னா	ஜா	ஶா	ஷா	ஸா	ஹா	க்ஷா
3	௶	௩	முக்கால்	ி	இ	கி	ஙி	சி	ஞி	டி	ணி	தி	நி	பி	மி	யி	ரி	லி	வி	ழி	ளி	றி	னி	ஜி	ஶி	ஷி	ஸி	ஹி	க்ஷி
4	௷	௪	அரைவீசம்	ீ	ஈ	கீ	ஙீ	சீ	ஞீ	டீ	ணீ	தீ	நீ	பீ	மீ	யீ	ரீ	லீ	வீ	ழீ	ளீ	றீ	னீ	ஜீ	ஶீ	ஷீ	ஸீ	ஹீ	க்ஷீ
5	௸	௫	வீசம்	ு	உ	கு	ஙு	சு	ஞு	டு	ணு	து	நு	பு	மு	யு	ரு	லு	வு	ழு	ளு	று	னு	ஜு	ஶு	ஷு	ஸு	ஹு	க்ஷு
6	௹	௬	மூவீசம்	ூ	ஊ	கூ	ஙூ	சூ	ஞூ	டூ	ணூ	தூ	நூ	பூ	மூ	யூ	ரூ	லூ	வூ	ழூ	ளூ	றூ	னூ	ஜூ	ஶூ	ஷூ	ஸூ	ஹூ	க்ஷூ
7	௺	௭	அரைமா	ெ	எ	கெ	ஙெ	செ	ஞெ	டெ	ணெ	தெ	நெ	பெ	மெ	யெ	ரெ	லெ	வெ	ழெ	ளெ	றெ	னெ	ஜெ	ஶெ	ஷெ	ஸெ	ஹெ	க்ஷெ
8	பௌர்ணமி	௮	ஒருமா	ே	ஏ	கே	ஙே	சே	ஞே	டே	ணே	தே	நே	பே	மே	யே	ரே	லே	வே	ழே	ளே	றே	னே	ஜே	ஶே	ஷே	ஸே	ஹே	க்ஷே
9	அமாவாசை	௯	இரண்டுமா	ை	ஐ	கை	ஙை	சை	ஞை	டை	ணை	தை	நை	பை	மை	யை	ரை	லை	வை	ழை	ளை	றை	னை	ஜை	ஶை	ஷை	ஸை	ஹை	க்ஷை
A	கார்த்திகை	௰	மும்மா	ொ	ஒ	கொ	ஙொ	சொ	ஞொ	டொ	ணொ	தொ	நொ	பொ	மொ	யொ	ரொ	லொ	வொ	ழொ	ளொ	றொ	னொ	ஜொ	ஶொ	ஷொ	ஸொ	ஹொ	க்ஷொ
B	ராஜ	௱	நாலுமா	ோ	ஓ	கோ	ஙோ	சோ	ஞோ	டோ	ணோ	தோ	நோ	போ	மோ	யோ	ரோ	லோ	வோ	ழோ	ளோ	றோ	னோ	ஜோ	ஶோ	ஷோ	ஸோ	ஹோ	க்ஷோ
C	ௐ	௲	முந்திரி	ௌ	ஔ	கௌ	ஙௌ	சௌ	ஞௌ	டௌ	ணௌ	தௌ	நௌ	பௌ	மௌ	யௌ	ரௌ	லௌ	வௌ	ழௌ	ளௌ	றௌ	னௌ	ஜௌ	ஶௌ	ஷௌ	ஸௌ	ஹௌ	க்ஷௌ
D			அரைக்காணி		ஃ																								ஸ்ரீ
E			காணி
F			முக்காணி

Note:
	Newly added. Not present in Unicode_v6.3.
	Allocated for researches(NLP)
	For future use

Analysis of TACE16 over present Unicode standard for Tamil language

Analysis of TACE16 over present Unicode standard for Tamil language:

Issues with the present Unicode for Tamil language

The present Unicode standard for Tamil is considered not adequate for efficient and effective usage of Tamil in computers, due to the following reasons:^[1]

Unicode code Tamil has code positions only for 31 out of 247 Tamil Characters. These 31 characters include 12 vowels, 18 agara-uyirmey, one aytham, not including five Grantha agara-uyirmey which are also provided code space in Unicode Tamil. The other Tamil Characters have to be rendered using a separate software. Only 10% of the Tamil Characters are provided code space in the Present Unicode Tamil. 90% of the Tamil Characters that are used in general text interchange are not provided code space.
The Uyir-meys that are left out in the present Unicode Tamil are simple characters, just like A, B, C, D are characters to English. Uyir-meys are not glyphs, nor ligatures, nor conjunct characters as assumed in Unicode. ka, kA, ki, kI, etc., are characters to Tamil.
In any plain Tamil text, Vowel Consonants (uyir-meys) form 64 to 70%; Vowels (uyir) form 5 to 6% and Consonants (meys) form 25 to 30%. Breaking high frequency letters like vowel-consonants into glyphs is highly inefficient.
This type of encoding which requires a rendering engine to realize a character while computing is not suitable for applications like system software developments in Tamil, searching and sorting and Natural language processing(NLP) in Tamil, It consumes extra time and space, making the computing process highly inefficient. For such applications Level-1 implementation where all the characters of a language have code positions in the encoding, like English is required.
This encoding is based on ISCII (1988) and therefore, the characters are not in the natural order of sequence. It requires a complex collation algorithm for arranging them in the natural order of sequence.
It uses multiple code points to render single characters. Multiple code points lead to security vulnerabilities, ambiguous combinations and requires the use of normalization.
Simple counting letters, sorting, searching are inefficient
It requires ZWJ/ZWNJ type hidden chars.
It needs exception table to prevent illegal combinations of code points.
Unicode Indic block is built on enormous, complex, error-prone edifice, based on an encoding that is NOT built to last.
Very first code point says "Tamil Sign Anusvara - Not used in Tamil".
Assumed collation was same as Devanagari - incorrectly uses ambiguous encoding to render same character.
It encodes 23 Vowel-Consonants (23 consonants + Ü) and calls them as consonants, against Tamil grammar.
Unnatural for Speech to Text/Text to Speech.
Inefficient to store, transmit and retrieval(For example, File reading and writing, Internet, etc.).
Complex processing hinders development.
Need normalization for string comparison.
A sequence of characters may correspond to a single glyph, that is, ச + ெ◌ + ◌ா = ெசா. Characters are not graphemes. According to Unicode ெசா is a grapheme; but ச, ெ◌, ◌ா are characters.
Requires Dynamic Composition - a text element encoded as a sequence of a base character followed by one or more combining marks.
There are two methods of rendering the Vowel Consonants. This leads to ambiguity in rendering characters.
The present Unicode is not efficient for parsing. For example, the name திருவள்ளுவர் looks like it should have seven letters. However, according to Unicode, this name has twelve characters: த ◌ி ர ◌ு வ ள ◌் ள ◌ு வ ர ◌
To properly count the letters in this name, an expert developer had to write a complex program and present it as a technical paper in a Tamil computing conference. To compare, counting letters in an English word is an exercise left to a beginning programmer. Such problems are triggered because a simple script such as Tamil is treated as a complex script by Unicode. For example in Python library open-tamil,^[5] which uses present Unicode Standard for Tamil, in order to count the number of Tamil letters in the given text, the function tamil.utf8.get_letters is first used to parse the text into a List and then returns the length of the list as the count of the number of letters.^[6] This type of complex programming logic or extra additional layer of framework requirement is needed when a simple script such as Tamil is treated as a complex script.
The Unicode standard policy is to encode only characters, not glyphs. However,^[7] because Unicode Tamil standard includes the vowel signs as combining characters. These signs that have no meaning to a Tamil reader would be displayed as is by character shaping engines that detect a blank space between them and a base character. Thus Unicode introduces the dotted circle as a Tamil character.
Unicode Tamil is not fully supported in many platforms primarily because Tamil is treated as a complex script that requires complex processing.
Since all the above-mentioned inefficiencies consumes extra processing cycles of a processor for a machine than needed, it will increase the overall lifetime power usage(electricity) by a machine which processes Unicode Tamil. For example, when processing a single Tamil character kI (கீ), it has to process both consonant and vowel modifier, which doubles the consumption of processing cycles of a processor.

Analysis of TACE16 over Unicode Tamil

The following data provides the comparison of analysis of current Unicode encoding for Tamil language vs TACE16 on E-Governance and Browsing:^[1]

TACE16 is efficient over Unicode Tamil by about 5.46 to 11.94 percent in the case of Data Storage Application.
TACE16 is efficient over Unicode Tamil by about 18.69 to 22.99 percent in the case of Sorting Index Data.
TACE16 is efficient over Unicode Tamil by about 25.39% when the entire data is of Tamil. The default collation sequence followed (Binary) while using the code space values in the New TACE16 is not as per Tamil Dictionary order. Some of the uyir-meys (Agara-uyirmeys) are taking precedence over vowels and other Uyirmeys in the New TACE16, the vowels and agarauyir-meys being in the 0B80 - 0B8F block and the other Uyir-meys being in the 0800 to 08FF. Because of this reason, sorting Unicode data looks better than TACE16 data.
TACE16 is faster in sorting over Unicode Tamil by about 0.31 to 16.96 percent.
Index creation on TACE16 data is faster by 36.7% than Unicode.
For Full key Search on Indexed Fields, TACE16 performed better than Unicode Tamil by up to 24.07%. In the case of non-indexed fields also TACE16 performed better than Unicode Tamil by up to 20.9%.
Rendering of static Tamil Data was fine with TACE16.

Advantages of TACE16 over Unicode Tamil

TACE16 character encoding scheme not only overcomes all the issues with the present Unicode encoding standard for Tamil language which are mentioned above, but also provides additional advantage over major performance improvements in both processing time and processing space which are the major factors in affecting the efficient and speedy execution of any computer based program. This system has the following additional advantages:^[1]

The encoding is Universal since it encompasses all characters that are found in general Tamil text interchange.
The Collation is sequential in accordance with the code value.
The encoding is unambiguous.
Any given code point always represents the same character.
There is no ambiguity as in the present Unicode Tamil.

The Unicode Tamil encoding had so many issues, someone created the following proposal to reencode Tamil.^[8] This was rejected by Unicode, who said that the reencoding would be damaging and there was no convincing evidence Unicode Tamil encoding is bad.^[9]

This system has the following advantages for computer programming:

The basic software design to accommodate Tamil characters and their processing are simplified.
Sorting and searching is very simple.
For a machine, TACE16 takes less processing cycles of a processor(which in turn takes less electricity) than Unicode Tamil. Basically, TACE16 is greener than Unicode Tamil.
TACE16 allows to do programming based on Tamil grammar, which is not very easy in Unicode Tamil (needs extra framework development).
The encoding is very efficient to parse. By simple arithmetic operation the characters can be parsed. In computer programming, second method is very efficient in terms of performance over large character set. Also, these methods follows the basic Tamil grammar that Consonant+Vowel=Vowel-Consonant(UyirMei) which is not followed in Unicode Tamil.

Method 1(By simple arithmetic operations):
 க் + இ = கி
 E210 (க்) + E203 (இ) - E200(Constant) = E213 (கி)
Method 2:
 க் (E210) + இ (E203) = கி (E213)
 E210 (க்) | (E203 (இ) & 000F (Constant)) = E213 (கி)

It is very efficient to divide a vowel-consonant (UyirMei) character into its corresponding vowel and consonant. This is very efficient in terms of performance over large data.
```
  /* To get Vowel */
  E213 (கி) & 'F20F (Constant)' = E203 (இ)

  /* To get Consonant */
  E213 (கி) & 'FFF0(Constant)' = E210 (க்)
```

It is very efficient to find whether a character is vowel or consonant or vowel-consonant (UyirMei) or numbers.

  /* | - Bitwise OR
   * & - Bitwise AND
   * ! - Bitwise NOT
   * ^ - Bitwise XOR
   * ||- Conditional OR
   * &&- Conditional AND
   */
  c = the TACE16 encoding for a Tamil character

  /* To check whether a character is vowel */
  /* Method 1 */
  ((c >= E201) && (c <= E20C)) == true // => Vowel
  /* Method 2 - If code positions E200, E20E, E20F are not used for any other purpose*/
  (((c & 'E20F (Constant)')==c) && (c != E20D)) == true // => Vowel
  ((!((c & 'E20F (Constant)')^c)) && (c != E20D)) == true // => Vowel

  /* To check whether a character is consonant or Vowel-consonant(UyirMei) */
  x = (c & '000F (Constant)') // If c is Vowel or Vowel-Consonant, then x = Unique number for each vowel starting from 1
  (((c >= E210) && (c <= E38C)) && (x == 0)) == true // => Consonant
  (((c >= E210) && (c <= E38C)) && ((x >= 1) && (x <= 12))) == true // => Vowel-Consonant(UyirMei)

  /* To check whether a character is Tamil number */
  /* Method 1 */
  ((c >= E180) && (c <= E18C)) == true // => Tamil Number
  /* Method 2*/
  //If code positions E18D-E18F are not used for any other purpose
  (c & 'E18F (Constant)') == c // => Tamil Number
  (!((c & 'E18F (Constant)')^c)) == true // => Tamil Number
  //If code positions E18D-E18F are used for any other purpose, then either Method 1 or below method can be used*/
  ((!((c & 'E18F (Constant)')^c)) && ((c & '000F (Constant)') <= 12)) == true // => Tamil Number

It is very easy to convert numbers to Tamil numbers(new Tamil number format) and vice versa(same as Unicode Tamil).

  /* To convert a number to new format of Tamil number and vice versa, direct digit to digit conversion is enough */

  /* To convert a number to new format of Tamil number */
  n = single digit number (0-9)
  /* Method 1 */
  (n & 'E18F (Constant)') // => Tamil Number
  /* Method 2 */
  (n | 'E180 (Constant)') // => Tamil Number

  /* To convert new format of Tamil number to a number */
  c = single digit Tamil number character(௦-௯)
  (c & '000F (Constant)') // => Number

Alternative Claims

Open-Tamil

The open-tamil project^[10] provides many of the common operations, e.g. to extract letters from Unicode UTF-8 encoded string, sorting, searching etc. Even though, the project claims Level-1 compliance of Tamil text processing without using TACE16, the project is still written on top of extra programming logic which is needed for present Unicode Standard for Tamil.

   #!/usr/bin/python2
   # -*- coding:UTF-8 -*-
   import codecs,os
   import tamil.utf8 as utf8
   with codecs.open('singl','w',encoding='utf-8') as ff:
        letters = utf8.get_letters(u"கூவிளம் என்பது என்ன சீர்")
        for letter in letters:
            ff.write(unicode(letter))
            print unicode(letter)
            ff.write(' ')
   ff.close()

generates the output, output: கூ வி ள ம் எ ன் ப து எ ன் ன சீ ர்

References

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[TACE16Report-1] 1 2 3 4 Report on the final recommendations of the task force on TACE16

[TNGovernmentTenderDocument-2] 1 2 Tamil Nadu Government's Tender Document for development of Tamil fonts and Tamil keyboard driver for 16-bit encodings (Unicode and TACE16)

[tamilvu.org-3] 1 2 http://www.tamilvu.org/tkbd/index.htm

[KBDFonts-4] 1 2 Tamil Nadu Government's Order(G.O.), Keyboard Drivers and Fonts

[5] ttps://github.com/arcturusannamalai/open-tamil open-tamil

[6] ttps://ezhillang.wordpress.com/2014/01/26/open-tamil-text-processing-%E0%AE%89%E0%AE%B0%E0%AF%88-%E0%AE%AA%E0%AE%95%E0%AF%81%E0%AE%AA%E0%AF%8D%E0%AE%AA%E0%AE%BE%E0%AE%AF%E0%AF%8D%E0%AE%B5%E0%AF%81/ tamil.utf8.get_letters

[7] ttps://ezhillang.wordpress.com/2014/01/26/open-tamil-text-processing-%E0%AE%89%E0%AE%B0%E0%AF%88-%E0%AE%AA%E0%AE%95%E0%AF%81%E0%AE%AA%E0%AF%8D%E0%AE%AA%E0%AE%BE%E0%AE%AF%E0%AF%8D%E0%AE%B5%E0%AF%81/

[8] ttps://www.unicode.org/L2/L2012/12033-tamil-presentation.pdf

[9] ttp://unicode.org/alloc/nonapprovals.html

[10] ttps://pypi.org/project/Open-Tamil/ open-tamil project

Character encodings
Early telecommunications	ASCII ISO/IEC 646 ISO/IEC 6937 T.61 BCDIC Baudot code Morse code Telegraph code Wabun code Special telegraphy codes Non-Latin Chinese Cyrillic Needle telegraph codes
ISO/IEC 8859	-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16
Bibliographic use	ANSEL ISO 5426 / 5426-2 / 5427 / 5428 / 6438 / 6861 / 6862 / 10585 / 10586 / 10754 / 11822 MARC-8
National standards	ArmSCII BraSCII CNS 11643 ELOT 927 GOST 10859 GB 18030 HKSCS I.S. 434 ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 PASCII SI 960 TIS-620 TSCII VISCII VSCII YUSCII
EUC	CN JP KR TW
ISO/IEC 2022	CN JP KR CCCII
MacOS code pages ("scripts")	Armenian Arabic Barents Cyrillic Celtic CentEuro ChineseSimp / EUC-CN ChineseTrad / Big5 Croatian Cyrillic Devanagari Dingbats Farsi (Persian) Gaelic Georgian Greek Gujarati Gurmukhi Hebrew Iceland Inuit Japanese / ShiftJIS Keyboard Korean / EUC-KR Latin (Kermit) Maltese/Esperanto Ogham / I.S. 434 Roman Romanian Sámi Symbol Thai / TIS-620 Turkish Turkic Latin Turkic Cyrillic Ukrainian
DOS code pages	100 111 112 113 151 152 161 162 163 164 165 166 210 220 301 437 449 489 620 667 668 707 708 709 710 711 714 715 720 721 737 768 770 771 772 773 774 775 776 777 778 790 850 851 852 853 854 855/872 856 857 858 859 860 861 862 863 864/17248 865 866/808 867 868 869 874/1161/1162 876 877 878 881 882 883 884 885 891 895 896 897 898 899 900 903 904 906 907 909 910 911 926 927 928 929 932 934 936 938 941 942 943 944 946 947 948 949 950/1370 951 966 991 1034 1039 1040 1041 1042 1043 1044 1046 1086 1088 1092 1093 1098 1108 1109 1114 1115 1116 1117 1118 1119 1125/848 1126 1127 1131/849 1139 1167 1168 1300 1351 1361 1362 1363 1372 1373 1374 1375 1380 1381 1385 1386 1391 1392 1393 1394 CWI-2 Iran System Kamenický KOI8 Mazovia MIK
IBM AIX code pages	367 371 806 813 819 895 896 912 913 914 915 916 919 920 921/901 922/902 923 952 953 954 955 956 957 958 959 960 961 963 964 965 970 971 1004 1006 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1029 1036 1089 1111 1124 1129/1163 1133 1350 1382 1383
IBM Apple MacIntosh emulations	1275 1280 1281 1282 1283 1284 1285 1286
IBM Adobe emulations	1038 1276 1277
IBM DEC emulations	1020 1021 1023 1090 1100 1101 1102 1103 1104 1105 1106 1107 1287 1288
IBM HP emulations	1050 1051 1052 1053 1054 1055 1056 1057 1058
Windows code pages	CER-GS 874/1162 (TIS-620) 932/943 (Shift JIS) 936/1386 (GBK) 950/1370 (Big5) 949/1363 (EUC-KR) 1169 1174 Extended Latin-8 1200 (UTF-16LE) 1201 (UTF-16BE) 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1261 1270 54936 (GB18030)
EBCDIC code pages	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37/1140 37-2 38 39 40 251 252 254 256 257 258 259 260 264 273/1141 274 275 276 277/1142 278/1143 279 280/1144 281 282 283 284/1145 285/1146 286 287 288 289 290 293 297/1147 298 300 310 320 321 322 330 351 352 353 355 357 358 359 360 361 363 382 383 384 385 386 387 388 389 390 391 392 393 394 395 410 420/16804 421 423 424/8616/12712 425 435 500/1148 803 829 833 834 835 836 837 838/838 839 870/1110/1153 871/1149 875/4971/9067 880 881 882 883 884 885 886 887 888 889 890 892 893 905 918 924 930/1390 931 933/1364 935/1388 937/1371 939/1399 1001 1002 1003 1005 1007 1024 1025/1154 1026/1155 1027 1028 1030 1031 1032 1033 1037 1047 1068 1069 1070 1071 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1087 1091 1097 1112/1156 1113 1122/1157 1123/1158 1130/1164 1132 1136 1137 1150 1151 1152 1159 1165 1166 1278 1279 1303 1364 1376 1377 JEF KEIS
Platform specific	Acorn Adobe Standard Adobe Latin 1 Apple II ATASCII Atari ST BICS Casio calculators CDC CPC DEC Radix-50 DEC MCS/NRCS DG International ELWRO-Junior FIELDATA GEM GEOS GSM 03.38 HP Roman Extension HP Roman-8 HP Roman-9 HP FOCAL HP RPL LICS LMBCS Mattel Aquarius Minitel MSX NEC APC NeXT PCW PETSCII Sharp calculators Sinclair QL Teletext TI calculators TRS-80 Ventura International Ventura Symbol WISCII XCCS ZX80 ZX81 ZX Spectrum
Unicode / ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 (UTF-16LE/UTF-16BE) / UCS-2 UTF-32 (UTF-32LE/UTF-32BE) / UCS-4 UTF-EBCDIC GB 18030 BOCU-1 CESU-8 SCSU
TeX typesetting system	Cork LGR LY1 OML OMS OMX OT1 OT2 OT3 OT4 T2A T2B T2C T2D T3 T4 T5 TS1 TS3 U X2
Miscellaneous code pages	ABICOMP APL ARIB STD-B24 HZ INIS INIS-8 ISO-IR-111 ISO-IR-182 ISO-IR-197 ISO-IR-200 ISO-IR-201 Johab SEASCII Stanford/ITS TACE16 TRON UTF-5 UTF-6 WTF-8
Related topics	Code page Control character (C0 C1) CCSID Character encodings in HTML Charset detection Han unification Hardware ISO 6429/IEC 6429/ANSI X3.64 Mojibake
Character sets