Code page 936 (Microsoft Windows)

Windows Code page 936 (abbreviated MS936, Windows-936 or (ambiguously) CP936),[1] is Microsoft's character encoding for simplified Chinese, one of the four DBCSs for East Asian languages. Originally, Windows-936 covered GB 2312 (in its EUC-CN form), but it was expanded to cover most of GBK with the release of Windows 95.

IBM's Code page 936[2] is a different encoding for Simplified Chinese, although International Components for Unicode does not include an IBM-936 codec, and uses the Windows code page for the "cp936" label.[1] IBM's code page for GBK coverage is Code page 1386 (CP1386 or IBM-1386), which is defined as a combination of the single byte Code page 1114 and the double byte Code page 1385.[3]

It was superseded by code page 54936 (GB 18030), but as of 2014 was still prevalent in use. The Windows command prompt uses CP936 as the default code page for simplified Chinese installations, although part of the GB 18030 was made mandatory for all software products sold in China. In 2002, the IANA Internet name GBK was registered with Windows-936's mapping,[4][5] making it the de facto GBK definition on the Internet.

The concepts of "Windows-936", "GBK",[lower-alpha 1] "GB2312" and "EUC-CN" are sometimes confused in various software products. Code pages MS936 and 1386 are not identical to GBK because a code page encodes characters, whereas GBK only defines code points. In addition, the Euro sign (€), encoded as 0x80 in both Windows-936 and IBM-1386, is not defined in GBK. On the other hand, 95 characters defined in GBK were initially not encoded into Windows-936.

This is partly resolved in later versions of Windows and, as in Windows 7, all GBK characters not in the Unicode BMP Private Use Area can be displayed using code page 936, but encoding the 95 characters was still not supported as of 2014. However, "CP936" and "GBK" are often used interchangeably because of the popularity of Microsoft products on the Chinese market when GBK was then published.

Since GBK superseded GB 2312 long ago, these two terms have also become virtually equivalent to many users, so "Windows-936", "GBK" and "GB 2312" are misunderstood by many to mean the same thing while they actually differ significantly. Instead of supporting precisely EUC-CN / GB 2312, most modern-day Windows-based software products mean partial support for GBK via Windows-936 when they use the term "GB 2312" as a character encoding option. This can be observed in products such as Microsoft Internet Explorer and Notepad++.

Notes

  1. GBK 1.0

References

  1. "windows-936-2000 (alias cp936)". ICU Demonstration - Converter Explorer. International Components for Unicode.
  2. "Coded character set identifiers - CCSID 936". IBM Globalization. IBM. Archived from the original on 2014-12-01.
  3. "Coded character set identifiers - CCSID 1386". IBM. Archived from the original on 2014-11-29.
  4. "Character Sets". Retrieved 3 October 2016.
  5. Application of IANA Charset Registration for GBK

Windows-936:

IBM-1386:

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.