Shift JIS

Shift_JIS-2004
Alias(es)	Shift_JISx0213
Language(s)	Japanese, Ainu, English, Russian
Standard	JIS X 0213
Extends	Shift_JIS (1997),; JIS X 0201 (8-bit)
Transforms / Encodes	JIS X 0213
Preceded by	Shift_JIS (1997)

Shift JIS
MIME / IANA	Shift_JIS
Language(s)	Primarily Japanese, but also supporting English, Russian
Standard	JIS X 0208:1997 Appendix 1
Classification	Extended ISO 646,[lower-alpha 1] Variable-width encoding, CJK encoding
Extends	JIS X 0201 8-bit format.
Transforms / Encodes	JIS X 0208
Succeeded by	Shift_JIS-2004 (JIS); Windows-31J (web)
	↑ Not in the strictest sense of the term, as ASCII bytes can appear as trail bytes.;

Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1. 0.4% of all web pages used Shift JIS in September 2018, a decline from 1.3% in July 2014.^[1]

Description

Shift JIS is based on character sets defined within JIS standards JIS X 0201:1997 (for the single-byte characters) and JIS X 0208:1997 (for the double-byte characters). The lead bytes for the double-byte characters are "shifted" around the 64 halfwidth katakana characters in the single-byte range 0xA1 to 0xDF. The single-byte characters 0x00 to 0x7F match the ASCII encoding, except for a yen sign (U+00A5) at 0x5C and an overline (U+203E) at 0x7E in place of the ASCII character set's backslash and tilde respectively. The single-byte characters from 0xA1 to 0xDF map to the half-width katakana characters found in JIS X 0201.

HTML written in Shift JIS can still be interpreted to some extent when incorrectly tagged as ASCII, and when the charset tag is in the top of the document itself, since the important start and end of HTML tags and fields, <, >, /, ", &, ; are coded by the same single bytes as in ASCII, and those bytes won't appear in two-byte sequences. Shift JIS is possible to use in string literals in programming languages such as C, but a few things must be taken into consideration. Firstly, that the escape character 0x5C, normally backslash, is the half-width yen sign (¥) in Shift JIS. If the programmer is aware of this, it would be possible to use printf("ハローワールド¥n"); (where ハローワールド is Hello, world and ¥n is an escape sequence), assuming the I/O system supports Shift JIS output. Secondly, the 0x5C byte will cause problems when it appears as second byte of a two-byte character, because it will be interpreted as an escape sequence, which will mess up the interpretation, unless followed by another 0x5C.

Shift JIS requires an 8-bit clean medium for transmission. It is fully backwards compatible with the legacy JIS X 0201 single-byte encoding, meaning it supports half-width katakana and that any valid JIS X 0201 string is also a valid Shift JIS string. For two-byte characters, however, Shift JIS only guarantees that the first byte will be high bit set (0x80–0xFF); the value of the second byte can be either high or low. Appearance of byte values 0x40–0x7E as second bytes of code words makes reliable Shift JIS detection difficult, because same codes are used for ASCII characters. Since the same byte value can be either first or second byte, string searches are difficult, since simple searches can match the second byte of a character and the first byte of the next, which is not a real character. String search algorithms must be tailor made for Shift JIS.

On the other hand, the competing 8-bit format EUC-JP, which does not support single-byte halfwidth katakana, allows for a much cleaner and direct conversion to and from JIS X 0208 code points, as all high bit set bytes are parts of a double-byte character and all codes from ASCII range represent single-byte characters.

Unicode also does not have some of the disadvantages of Shift JIS. Unicode does not have ambiguous versions: new characters are assigned to unused places by a single organisation while private use areas are clearly designated, will never be used for standard characters, and are rarely needed due to the comprehensive nature of Unicode. For Shift JIS, companies work in parallel. UTF-8-encoded Unicode is backwards compatible with ASCII also for 0x5C, and does not have the string search problem.

For a double-byte JIS sequence $j_{1}j_{2}$ ,^[2] the transformation to the corresponding Shift JIS bytes $s_{1}s_{2}$ is:

s_{1}={\begin{cases}\left\lfloor {\frac {j_{1}+1}{2}}\right\rfloor +112&{\mbox{if }}33\leq j_{1}\leq 94\\\left\lfloor {\frac {j_{1}+1}{2}}\right\rfloor +176&{\mbox{if }}95\leq j_{1}\leq 126\end{cases}}

s_{2}={\begin{cases}j_{2}+31+\left\lfloor {\frac {j_{2}}{96}}\right\rfloor &{\mbox{if }}j_{1}{\mbox{ is odd }}\\j_{2}+126&{\mbox{if }}j_{1}{\mbox{ is even }}\end{cases}}

Multiple versions

Relationship between Shift_JIS variants on the PC and related encodings, including intersections and other subsets. Names given are descriptive.

Many different versions of Shift JIS exist. There are two areas for expansion:

Firstly, JIS X 0208 does not fill the whole 94×94 space encoded for it in Shift JIS, therefore there is room for more characters here — these are really extensions to JIS X 0208 rather than to Shift JIS itself.

Secondly, Shift JIS has more encoding space than is needed for JIS X 0201 and JIS X 0208 (see § Shift JIS byte map below), and this space can and is used for yet more characters.

Windows-932 / Windows-31J

The most popular extension is Windows code page 932 (a CCSID also used for IBM's extension to Shift JIS), which is registered with the IANA as "Windows-31J",^[3] separately from Shift JIS. This was popularized by Microsoft, although Microsoft itself does not recognize the Windows-31J name and instead calls that variation "shift_jis".^[4] IBM's code page 943 includes the same double-byte codes as Microsoft's code page 932, while IBM's code page 932 includes fewer extensions.^[5]

Windows-31J assigns 0x5C to U+005C REVERSE SOLIDUS (the backslash), and 0x7E to U+007E TILDE, following US-ASCII.^[6] However, most localised fonts on Windows display U+005C as a Yen sign for JIS X 0201 compatibility.^[7]^[8] It includes several extensions, namely "NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119)",^[3] in addition to setting some encoding space aside for end user definition.^[9]

Windows codepage 932 is the version used in the W3C/WHATWG encoding standard used by HTML5 (including such "formerly proprietary extensions from IBM and NEC"),^[10] which also treats the label "shift_jis" interchangeably with "windows-31j" with the intent of being "compatible with deployed content".^[11]

MacJapanese

The version of Shift-JIS originating from the classic Mac OS (known as x-mac-japanese, Code page 10001^[4] or MacJapanese) assigned the tilde to 0x7E (following US-ASCII, not JIS X 0201 which assigns the overline here), but the Yen sign to 0x5C (as in JIS X 0201 and standard Shift JIS). It also extended JIS X 0201 by assigning the backslash to 0x80 (corresponding to 0x5C in US-ASCII), the non-breaking space to 0xA0, the copyright sign to 0xFD, the trademark symbol to 0xFE and the half-width horizontal ellipsis to 0xFF. It also added extended double byte characters; including 53 vertical presentation forms in the Shift_JIS range 0xEB41–0xED96, at 84 JIS rows down from their canonical forms, and 260 special characters in the Shift_JIS range 0x8540–0x886D.^[12]

However, certain Mac OS typefaces used other variants. Sai Mincho and Chu Gothic include additional vertical presentation forms and a different set of extended special characters, some of which were only available in the printer versions of the fonts. Older versions of Maru Gothic and Hon Mincho from System 7.1 encoded vertical presentation forms at 10 (not 84) JIS rows down from their canonical forms, and did not include the special character extensions, this was subsequently changed.^[12]^[13]

Shift_JISx0213 and Shift_JIS-2004

The newer JIS X 0213 standard defines an extended variant of Shift_JIS referred to as Shift_JISx0213 (in a previous version of the standard) or Shift_JIS-2004. It is a superset of standard Shift JIS.^[14]

In order to represent the allocated rows on both planes of JIS X 0213, Shift_JIS-2004 uses the following method of mapping codepoints.^[15]

s_{1}={\begin{cases}\left\lfloor {\frac {k+257}{2}}\right\rfloor &{\mbox{if }}m=1{\mbox{ and }}1\leq k\leq 62\\\left\lfloor {\frac {k+385}{2}}\right\rfloor &{\mbox{if }}m=1{\mbox{ and }}63\leq k\leq 94\\\left\lfloor {\frac {k+479}{2}}\right\rfloor -\left\lfloor {\frac {k}{8}}\right\rfloor \times 3&{\mbox{if }}m=2{\mbox{ and }}k=1,3,4,5,8,12,13,14,15\\\left\lfloor {\frac {k+411}{2}}\right\rfloor &{\mbox{if }}m=2{\mbox{ and }}78\leq k\leq 94\end{cases}}

s_{2}={\begin{cases}t+63&{\mbox{if }}k{\mbox{ is odd and }}1\leq t\leq 63\\t+64&{\mbox{if }}k{\mbox{ is odd and }}64\leq t\leq 94\\t+158&{\mbox{if }}k{\mbox{ is even }}\end{cases}}

In the above, $s_{1}s_{2}$ is a two-byte Shift_JIS-2004 sequence, $m$ is the plane (面, men, surface) number (1 or 2), $k$ is the row (区, ku, ward) number (1-94) and $t$ is the cell (点, ten, point) number (1-94). The ku and ten numbers are equivalent to $j_{1}-32$ and $j_{2}-32$ respectively, where $j_{1}j_{2}$ is a two-byte JIS sequence referencing a given plane.

The same set of characters can represented by EUC-JIS-2004, the EUC-JP based counterpart.

Some of the additions collide with popular Shift JIS extensions, including Windows codepage 932 which is used in web standards (see above). For example, compare plane 1 row 89 in JIS X 0213 (beginning 硃, 硎, 硏…)^[16] to row 89 in the JIS X 0208 variant defined in web standards (beginning 纊, 褜, 鍈…).^[17] In addition, some of the characters map to Unicode characters beyond the BMP.

Other variants

The space with lead bytes 0xF5 to 0xF9 (beyond the region used for JIS X 0208) is used by Japanese mobile phone operators for pictographs for use in E-mail.^[18] KDDI goes further and defines hundreds more in the space with lead bytes 0xF3 and 0xF4.^[19]

Beyond even this, there have been numerous minor variations made on Shift JIS, with individual characters here and there altered. Most of these extensions and variants have no IANA registration, so there is much scope for confusion, if the extensions are used.

A variant is the one that must be used if wanting to encode Shift JIS in source code strings of C and similar programming languages. This variant doubles the byte 0x5C if it appears as second byte of a two-byte character, but not if it appears as a single "¥" (ASCII: "\") character, because 0x5C is the beginning of an escape sequence. The best way of handling this is a special editor which encodes Shift JIS this way.

Shift JIS byte map

As defined in JIS X 0208:1997

The chart below gives the detailed meaning of each byte in a stream encoded in standard Shift JIS (conforming to JIS X 0208:1997).

First byte
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0	␀	␁	␂	␃	␄	␅	␆	␇	␈	␉	␊	␋	␌	␍	␎	␏
1	␐	␑	␒	␓	␔	␕	␖	␗	␘	␙	␚	␛	␜	␝	␞	␟
2	␠	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5	P	Q	R	S	T	U	V	W	X	Y	Z	[	¥	]	^	_
6	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	‾	␡
8
9
A		｡	｢	｣	､	･	ｦ	ｧ	ｨ	ｩ	ｪ	ｫ	ｬ	ｭ	ｮ	ｯ
B	ｰ	ｱ	ｲ	ｳ	ｴ	ｵ	ｶ	ｷ	ｸ	ｹ	ｺ	ｻ	ｼ	ｽ	ｾ	ｿ
C	ﾀ	ﾁ	ﾂ	ﾃ	ﾄ	ﾅ	ﾆ	ﾇ	ﾈ	ﾉ	ﾊ	ﾋ	ﾌ	ﾍ	ﾎ	ﾏ
D	ﾐ	ﾑ	ﾒ	ﾓ	ﾔ	ﾕ	ﾖ	ﾗ	ﾘ	ﾙ	ﾚ	ﾛ	ﾜ	ﾝ	ﾞ	ﾟ
E
F

Second byte
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F

	Non printable ASCII character
	Unaltered ASCII character
	Modified ASCII character
	Single-byte half-width katakana
	First byte of a double-byte JIS X 0208 character
	Unused as first byte of a JIS X 0208 character
	Second byte of a double-byte JIS X 0208 character whose first half of the JIS sequence was odd
	Second byte of a double-byte JIS X 0208 character whose first half of the JIS sequence was even
	Unused as second byte of a JIS X 0208 character

With vendor or JIS X 0213 extensions

Some of the bytes which are not used for single-byte codes or initial bytes in JIS X 0208:1997 are used by certain extensions, resulting in the layout detailed in the chart below.

First byte
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0	␀	␁	␂	␃	␄	␅	␆	␇	␈	␉	␊	␋	␌	␍	␎	␏
1	␐	␑	␒	␓	␔	␕	␖	␗	␘	␙	␚	␛	␜	␝	␞	␟
2	␠	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5	P	Q	R	S	T	U	V	W	X	Y	Z	[	¥	]	^	_
6	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	‾	␡
8
9
A		｡	｢	｣	､	･	ｦ	ｧ	ｨ	ｩ	ｪ	ｫ	ｬ	ｭ	ｮ	ｯ
B	ｰ	ｱ	ｲ	ｳ	ｴ	ｵ	ｶ	ｷ	ｸ	ｹ	ｺ	ｻ	ｼ	ｽ	ｾ	ｿ
C	ﾀ	ﾁ	ﾂ	ﾃ	ﾄ	ﾅ	ﾆ	ﾇ	ﾈ	ﾉ	ﾊ	ﾋ	ﾌ	ﾍ	ﾎ	ﾏ
D	ﾐ	ﾑ	ﾒ	ﾓ	ﾔ	ﾕ	ﾖ	ﾗ	ﾘ	ﾙ	ﾚ	ﾛ	ﾜ	ﾝ	ﾞ	ﾟ
E
F

Second byte
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F

	Non printable ASCII character
	Unaltered ASCII character
	Modified ASCII character
	Single-byte half-width katakana
	First byte of a double-byte character, used by JIS X 0208 (and by extensions such as JIS X 0213 plane 1)
	First byte of a double-byte character, unallocated in JIS X 0208 but used by JIS X 0213 plane 1 or by vendor extensions
	First byte of a double-byte character beyond JIS X 0208, used for JIS X 0213 plane 2 or for unrelated extensions
	Not used as first byte, used by some single byte extensions
	Second byte of a double-byte character whose first half of the JIS sequence was odd
	Second byte of a double-byte character whose first half of the JIS sequence was even
	Unused as second byte of a double-byte character

References

↑ https://w3techs.com/technologies/history_overview/character_encoding
↑ j₁ and j₂ are each in the range 33 (0x21) to 126 (0x7e) inclusive (i.e., 7-bit character values excluding control characters (0–31 (0x1f) and 127 (0x7f)) and space)
1 2 "Character Sets". IANA.
1 2 "Encoding.WindowsCodePage Property - .NET Framework (current version)". MSDN. Microsoft.
↑ "IBM-943 and IBM-932". IBM Knowledge Center. IBM.
↑ "CP932.TXT". Unicode Consortium.
↑ "3.1.1 Details of Problems". Problems and Solutions for Unicode and User/Vendor Defined Characters. The Open Group Japan. Archived from the original on 1999-02-03.
↑ Kaplan, Michael S. (2005-09-17). "When is a backslash not a backslash?".
↑ Kaplan, Michael S (2007-05-26). "The PUA outside of Unicode". Sorting it all out.
↑ "5. Indexes (§ Index jis0208)". Encoding Standard. WHATWG.
↑ "4.2. Names and labels". Encoding Standard. WHATWG.
1 2 "JAPANESE.TXT: Map (external version) from Mac OS Japanese encoding to Unicode 2.1 and later". Apple Computer, Inc.; Unicode Consortium.
↑ "Encoding Variants for MacJapanese". Apple Developer Documentation. Apple.
↑ "JIS X 0213 Code Mapping Tables". x0213.org.
↑ "JIS X 0213の代表的な符号化方式 § Shift_JIS-2004" (in Japanese). Hexadecimal numbers in the source have been converted to decimal for display.
↑ "233: Japanese Graphic Character Set for Information Interchange, Plane 1" (PDF). IPSJ.
↑ "Index jis0208 visualization". Encoding Standard. WHATWG.
↑ "Original Emoji from DoCoMo". FileFormat.info.
↑ "Original Emoji from KDDI". FileFormat.info.

External links

Shift-JIS Kanji Table – a table of the non-ASCII part of the codeset
"Windows Codepage 932". Microsoft. May 1, 2005. Archived from the original on 2008-03-07. – Microsoft's definition
Forms of Shift-JIS in ICU (International Components for Unicode)

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Not in the strictest sense of the term, as ASCII bytes can appear as trail bytes.

[2] ttps://w3techs.com/technologies/history_overview/character_encoding

[3] ₁ and j₂ are each in the range 33 (0x21) to 126 (0x7e) inclusive (i.e., 7-bit character values excluding control characters (0–31 (0x1f) and 127 (0x7f)) and space)

[iana31j-4] 1 2 "Character Sets". IANA.

[msdnlabels-5] 1 2 "Encoding.WindowsCodePage Property - .NET Framework (current version)". MSDN. Microsoft.

[ibm932v943-6] "IBM-943 and IBM-932". IBM Knowledge Center. IBM.

[7] "CP932.TXT". Unicode Consortium.

[8] "3.1.1 Details of Problems". Problems and Solutions for Unicode and User/Vendor Defined Characters. The Open Group Japan. Archived from the original on 1999-02-03.

[kaplan-9] Kaplan, Michael S. (2005-09-17). "When is a backslash not a backslash?".

[10] Kaplan, Michael S (2007-05-26). "The PUA outside of Unicode". Sorting it all out.

[11] "5. Indexes (§ Index jis0208)". Encoding Standard. WHATWG.

[12] "4.2. Names and labels". Encoding Standard. WHATWG.

[macjapanese-13] 1 2 "JAPANESE.TXT: Map (external version) from Mac OS Japanese encoding to Unicode 2.1 and later". Apple Computer, Inc.; Unicode Consortium.

[14] "Encoding Variants for MacJapanese". Apple Developer Documentation. Apple.

[x0213org-15] "JIS X 0213 Code Mapping Tables". x0213.org.

[16] "JIS X 0213の代表的な符号化方式 § Shift_JIS-2004" (in Japanese). Hexadecimal numbers in the source have been converted to decimal for display.

[17] "233: Japanese Graphic Character Set for Information Interchange, Plane 1" (PDF). IPSJ.

[18] "Index jis0208 visualization". Encoding Standard. WHATWG.

[19] "Original Emoji from DoCoMo". FileFormat.info.

[20] "Original Emoji from KDDI". FileFormat.info.

Character encodings
Early telecommunications	ASCII ISO/IEC 646 ISO/IEC 6937 T.61 BCDIC Baudot code Morse code Telegraph code Wabun code Special telegraphy codes Non-Latin Chinese Cyrillic Needle telegraph codes
ISO/IEC 8859	-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16
Bibliographic use	ANSEL ISO 5426 / 5426-2 / 5427 / 5428 / 6438 / 6861 / 6862 / 10585 / 10586 / 10754 / 11822 MARC-8
National standards	ArmSCII BraSCII CNS 11643 ELOT 927 GOST 10859 GB 18030 HKSCS I.S. 434 ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 PASCII SI 960 TIS-620 TSCII VISCII VSCII YUSCII
EUC	CN JP KR TW
ISO/IEC 2022	CN JP KR CCCII
MacOS code pages ("scripts")	Armenian Arabic Barents Cyrillic Celtic CentEuro ChineseSimp / EUC-CN ChineseTrad / Big5 Croatian Cyrillic Devanagari Dingbats Farsi (Persian) Gaelic Georgian Greek Gujarati Gurmukhi Hebrew Iceland Inuit Japanese / ShiftJIS Keyboard Korean / EUC-KR Latin (Kermit) Maltese/Esperanto Ogham / I.S. 434 Roman Romanian Sámi Symbol Thai / TIS-620 Turkish Turkic Latin Turkic Cyrillic Ukrainian
DOS code pages	100 111 112 113 151 152 161 162 163 164 165 166 210 220 301 437 449 489 620 667 668 707 708 709 710 711 714 715 720 721 737 768 770 771 772 773 774 775 776 777 778 790 850 851 852 853 854 855/872 856 857 858 859 860 861 862 863 864/17248 865 866/808 867 868 869 874/1161/1162 876 877 878 881 882 883 884 885 891 895 896 897 898 899 900 903 904 906 907 909 910 911 926 927 928 929 932 934 936 938 941 942 943 944 946 947 948 949 950/1370 951 966 991 1034 1039 1040 1041 1042 1043 1044 1046 1086 1088 1092 1093 1098 1108 1109 1114 1115 1116 1117 1118 1119 1125/848 1126 1127 1131/849 1139 1167 1168 1300 1351 1361 1362 1363 1372 1373 1374 1375 1380 1381 1385 1386 1391 1392 1393 1394 CWI-2 Iran System Kamenický KOI8 Mazovia MIK
IBM AIX code pages	367 371 806 813 819 895 896 912 913 914 915 916 919 920 921/901 922/902 923 952 953 954 955 956 957 958 959 960 961 963 964 965 970 971 1004 1006 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1029 1036 1089 1111 1124 1129/1163 1133 1350 1382 1383
IBM Apple MacIntosh emulations	1275 1280 1281 1282 1283 1284 1285 1286
IBM Adobe emulations	1038 1276 1277
IBM DEC emulations	1020 1021 1023 1090 1100 1101 1102 1103 1104 1105 1106 1107 1287 1288
IBM HP emulations	1050 1051 1052 1053 1054 1055 1056 1057 1058
Windows code pages	CER-GS 874/1162 (TIS-620) 932/943 (Shift JIS) 936/1386 (GBK) 950/1370 (Big5) 949/1363 (EUC-KR) 1169 1174 Extended Latin-8 1200 (UTF-16LE) 1201 (UTF-16BE) 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1261 1270 54936 (GB18030)
EBCDIC code pages	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37/1140 37-2 38 39 40 251 252 254 256 257 258 259 260 264 273/1141 274 275 276 277/1142 278/1143 279 280/1144 281 282 283 284/1145 285/1146 286 287 288 289 290 293 297/1147 298 300 310 320 321 322 330 351 352 353 355 357 358 359 360 361 363 382 383 384 385 386 387 388 389 390 391 392 393 394 395 410 420/16804 421 423 424/8616/12712 425 435 500/1148 803 829 833 834 835 836 837 838/838 839 870/1110/1153 871/1149 875/4971/9067 880 881 882 883 884 885 886 887 888 889 890 892 893 905 918 924 930/1390 931 933/1364 935/1388 937/1371 939/1399 1001 1002 1003 1005 1007 1024 1025/1154 1026/1155 1027 1028 1030 1031 1032 1033 1037 1047 1068 1069 1070 1071 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1087 1091 1097 1112/1156 1113 1122/1157 1123/1158 1130/1164 1132 1136 1137 1150 1151 1152 1159 1165 1166 1278 1279 1303 1364 1376 1377 JEF KEIS
Platform specific	Acorn Adobe Standard Adobe Latin 1 Apple II ATASCII Atari ST BICS Casio calculators CDC CPC DEC Radix-50 DEC MCS/NRCS DG International ELWRO-Junior FIELDATA GEM GEOS GSM 03.38 HP Roman Extension HP Roman-8 HP Roman-9 HP FOCAL HP RPL LICS LMBCS Mattel Aquarius Minitel MSX NEC APC NeXT PCW PETSCII Sharp calculators Sinclair QL Teletext TI calculators TRS-80 Ventura International Ventura Symbol WISCII XCCS ZX80 ZX81 ZX Spectrum
Unicode / ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 (UTF-16LE/UTF-16BE) / UCS-2 UTF-32 (UTF-32LE/UTF-32BE) / UCS-4 UTF-EBCDIC GB 18030 BOCU-1 CESU-8 SCSU
TeX typesetting system	Cork LGR LY1 OML OMS OMX OT1 OT2 OT3 OT4 T2A T2B T2C T2D T3 T4 T5 TS1 TS3 U X2
Miscellaneous code pages	ABICOMP APL ARIB STD-B24 HZ INIS INIS-8 ISO-IR-111 ISO-IR-182 ISO-IR-197 ISO-IR-200 ISO-IR-201 Johab SEASCII Stanford/ITS TACE16 TRON UTF-5 UTF-6 WTF-8
Related topics	Code page Control character (C0 C1) CCSID Character encodings in HTML Charset detection Han unification Hardware ISO 6429/IEC 6429/ANSI X3.64 Mojibake
Character sets