Tags (Unicode block)

Tags
Range U+E0000..U+E007F
(128 code points)
Plane SSP
Scripts Common
Assigned 97 code points
Unused 31 reserved code points
1 deprecated
Unicode version history
3.1 97 (+97)
Note: [1][2]

Tags is a Unicode block containing formatting tag characters (language tag and ASCII character tags).

U+E0001, U+E0020–U+E007F were originally intended for invisibly tagging texts by language[3] but that use is no longer recommended.[4] All of those characters were deprecated in Unicode 5.1.

With the release of Unicode 8.0, U+E0020–U+E007E are no longer deprecated characters. The change was made "to clear the way for the potential future use of tag characters for a purpose other than to represent language tags".[5] Unicode states that "the use of tag characters to represent language tags in a plain text stream is still a deprecated mechanism for conveying language information about text".[5]

With the release of Unicode 9.0, U+E007F is no longer a deprecated character. (U+E0001 LANGUAGE TAG remains deprecated.) The release of Emoji 5.0 in March 2017 considers these characters to be emojis for use as modifiers in special sequences. The only usage specified is for representing the flags of regions, alongside the use of Regional Indicator Symbols for national flags[6]. These sequences consist of U+1F3F4 🏴 WAVING BLACK FLAG followed by a sequence of tags corresponding to the region as coded in the CLDR, then U+E007F CANCEL TAG. For example, using the tags for "gbeng" (🏴󠁧󠁢󠁥󠁮󠁧󠁿) will cause some systems to display the flag of England, those for "gbsct" (🏴󠁧󠁢󠁳󠁣󠁴󠁿) the flag of Scotland, and those for "gbwls" (🏴󠁧󠁢󠁷󠁬󠁳󠁿) the flag of Wales.[6] Sequences representing other subnational flags (for examples US states) are also possible using this mechanism, but as of Unicode version 11.0 only the three flag sequences listed above are "Recommended for General Interchange" by the Unicode Consortium, meaning they are "most likely to be widely supported across multiple platforms".[7]

Tags[1][2][3]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+E000x  LANG 
U+E001x
U+E002x   SP     !     "     #     $     %     &     '     (     )     *     +     ,     -     .     /  
U+E003x   0     1     2     3     4     5     6     7     8     9     :     :     <     =     >     ?  
U+E004x   @     A     B     C     D     E     F     G     H     I     J     K     L     M     N     O  
U+E005x   P     Q     R     S     T     U     V     W     X     Y     Z     [     \     ]     ^     _  
U+E006x   `     a     b     c     d     e     f     g     h     i     j     k     l     m     n     o  
U+E007x   p     q     r     s     t     u     v     w     x     y     z     {     |     }     ~    END 
Notes
1.^ As of Unicode version 11.0
2.^ Grey areas indicate non-assigned code points
3.^ Unicode code points U+E0001 and U+E0020 through U+E007F were deprecated with Unicode version 5.1 however as of Unicode version 9.0 only U+E0001 remains deprecated

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Tags block:

VersionFinal code points[lower-alpha 1]CountL2 IDWG2 IDDocument
3.1U+E00011L2/97-203Whistler, Ken; Adams, Glenn (1997-08-05), Plane 14 characters for generic tags
L2/97-171R2Whistler, Ken (1997-09-18), Plane 14 Characters for Generic Tags
L2/97-256Allouche, Mati (1997-10-20), Comments on Plane 14 Position Paper
L2/97-255RAliprand, Joan (1997-12-03), "3.B. Lightweight language tagging", Approved Minutes – UTC #73 & L2 #170 joint meeting, Palo Alto, CA – August 4-5, 1997
L2/98-027N1670Plane 14 characters for language tags, 1997-12-12
L2/98-039Aliprand, Joan; Winkler, Arnold (1998-02-24), "2.C REVISED PROPOSALS", Preliminary Minutes – UTC #74 & L2 #171, Mountain View, CA – December 5, 1997
N1697US contribution for the definition of an updated scope for ISO/IEC 10646 Part 2, 1998-03-02
L2/98-281RAliprand, Joan (1998-07-31), "IETF and W3C Issues", Unconfirmed Minutes – UTC #77 & NCITS Subgroup L2 # 174 JOINT MEETING, Redmond, WA -- July 29-31, 1998
L2/02-166R2Moore, Lisa (2002-08-09), "Character Deprecation", UTC #91 Minutes
U+E0020..E007F96L2/16-042Fonts, Agustin; Pournader, Roozbeh (2015-01-26), Clarifications Requested for "Full Emoji Data" and Emoji Flags
L2/15-145Edberg, Peter (2015-05-04), Proposal for additional regional indicator symbols
L2/15-107Moore, Lisa (2015-05-12), "E.1.3 Proposal for additional regional indicator symbols", UTC #143 Minutes
L2/15-190Edberg, Peter (2015-06-29), PRI #299 Background: Representing Additional Types of Flags
L2/15-206Davis, Mark (2015-07-25), Region / Subdivision validity for flags
L2/16-180Burge, Jeremy; Williams, Owen (2016-07-07), Proposal to include Emoji Flags for England, Scotland and Wales
L2/17-048Pournader, Roozbeh (2017-01-24), Feedback on PRI 343 (Unicode Emoji 5.0)
L2/17-086Burge, Jeremy; et al. (2017-03-27), Add ZWJ, VS-16, Keycaps & Tags to Emoji_Component
  1. ↑ Proposed code points and characters names may differ from final code points and names

References

  1. ↑ "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. ↑ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.
  3. ↑ "RFC2482: Language Tagging in Unicode Plain Text". Network Working Group. January 1999.
  4. ↑ "RFC6082: Deprecating Unicode Language Tag Characters: RFC 2482 is Historic". Internet Engineering Task Force (IETF). November 2010.
  5. 1 2 "Unicode 8.0.0, Implications for Migration". Unicode Consortium.
  6. 1 2 "UTR #51: Unicode Emoji". Unicode Consortium. 2017-05-18.
  7. ↑ "emoji-sequences.txt". Unicode Consortium. Retrieved 27 April 2018.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.