Basic Latin (Unicode block)

The Basic Latin or C0 Controls and Basic Latin Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

C0 controls and Basic Latin
RangeU+0000..U+007F
(128 code points)
PlaneBMP
ScriptsLatin (52 char.)
Common (76 char.)
Major alphabetsEnglish
French
German
Spanish
Vietnamese
Symbol setsArabic numerals
Punctuation
Assigned128 code points
33 Control or Format
Unused0 reserved code points
Source standardsISO/IEC 8859, ISO 646
Unicode version history
1.0.0128 (+128)
Note: [1][2]

The Basic Latin block was included in its present form from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.[3]

Table of characters

Code Result Description Acronym
C0 controls
U+0000 Null character NUL
U+0001 Start of Heading SOH
U+0002 Start of Text STX
U+0003 End-of-text character ETX
U+0004 End-of-transmission character EOT
U+0005 Enquiry character ENQ
U+0006 Acknowledge character ACK
U+0007 Bell character BEL
U+0008 Backspace BS
U+0009 Horizontal tab HT
U+000A Line feed LF
U+000B Vertical tab VT
U+000C Form feed FF
U+000D Carriage return CR
U+000E Shift Out SO
U+000F Shift In SI
U+0010 Data Link Escape DLE
U+0011 Device Control 1 DC1
U+0012 Device Control 2 DC2
U+0013 Device Control 3 DC3
U+0014 Device Control 4 DC4
U+0015 Negative-acknowledge character NAK
U+0016 Synchronous Idle SYN
U+0017 End of Transmission Block ETB
U+0018 Cancel character CAN
U+0019 End of Medium EM
U+001A Substitute character SUB
U+001B Escape character ESC
U+001C File Separator FS
U+001D Group Separator GS
U+001E Record Separator RS
U+001F Unit Separator US
ASCII punctuation and symbols
U+0020   Space SP
U+0021 ! Exclamation mark EXC
U+0022 " Quotation mark QUO
U+0023 # Number sign
U+0024 $ Dollar sign
U+0025 % Percent sign
U+0026 & Ampersand
U+0027 ' Apostrophe
U+0028 ( Left parenthesis
U+0029 ) Right parenthesis
U+002A * Asterisk
U+002B + Plus sign
U+002C , Comma
U+002D - Hyphen-minus
U+002E . Full stop or period
U+002F / Solidus or Slash
ASCII digits
U+0030 0 Digit Zero
U+0031 1 Digit One
U+0032 2 Digit Two
U+0033 3 Digit Three
U+0034 4 Digit Four
U+0035 5 Digit Five
U+0036 6 Digit Six
U+0037 7 Digit Seven
U+0038 8 Digit Eight
U+0039 9 Digit Nine
ASCII punctuation and symbols
U+003A : Colon
U+003B ; Semicolon
U+003C < Less-than sign
U+003D = Equal sign
U+003E > Greater-than sign
U+003F ? Question mark
U+0040 @ At sign or Commercial at
Uppercase Latin alphabet
U+0041 A Latin Capital letter A
U+0042 B Latin Capital letter B
U+0043 C Latin Capital letter C
U+0044 D Latin Capital letter D
U+0045 E Latin Capital letter E
U+0046 F Latin Capital letter F
U+0047 G Latin Capital letter G
U+0048 H Latin Capital letter H
U+0049 I Latin Capital letter I
U+004A J Latin Capital letter J
U+004B K Latin Capital letter K
U+004C L Latin Capital letter L
U+004D M Latin Capital letter M
U+004E N Latin Capital letter N
U+004F O Latin Capital letter O
U+0050 P Latin Capital letter P
U+0051 Q Latin Capital letter Q
U+0052 R Latin Capital letter R
U+0053 S Latin Capital letter S
U+0054 T Latin Capital letter T
U+0055 U Latin Capital letter U
U+0056 V Latin Capital letter V
U+0057 W Latin Capital letter W
U+0058 X Latin Capital letter X
U+0059 Y Latin Capital letter Y
U+005A Z Latin Capital letter Z
ASCII punctuation and symbols
U+005B [ Left Square Bracket
U+005C \ Backslash [A]
U+005D ] Right Square Bracket
U+005E ^ Circumflex accent
U+005F _ Low line
U+0060 ` Grave accent
Lowercase Latin alphabet
U+0061 a Latin Small Letter A
U+0062 b Latin Small Letter B
U+0063 c Latin Small Letter C
U+0064 d Latin Small Letter D
U+0065 e Latin Small Letter E
U+0066 f Latin Small Letter F
U+0067 g Latin Small Letter G
U+0068 h Latin Small Letter H
U+0069 i Latin Small Letter I
U+006A j Latin Small Letter J
U+006B k Latin Small Letter K
U+006C l Latin Small Letter L
U+006D m Latin Small Letter M
U+006E n Latin Small Letter N
U+006F o Latin Small Letter O
U+0070 p Latin Small Letter P
U+0071 q Latin Small Letter Q
U+0072 r Latin Small Letter R
U+0073 s Latin Small Letter S
U+0074 t Latin Small Letter T
U+0075 u Latin Small Letter U
U+0076 v Latin Small Letter V
U+0077 w Latin Small Letter W
U+0078 x Latin Small Letter X
U+0079 y Latin Small Letter Y
U+007A z Latin Small Letter Z
ASCII punctuation and symbols
U+007B { Left Curly Bracket
U+007C | Vertical bar
U+007D } Right Curly Bracket
U+007E ~ Tilde
Control character
U+007F Delete DEL
A The letter U+005C (\) may show up as a Yen(¥) or Won(₩) sign in Japanese/Korean fonts mistaking Unicode (especially UTF-8) as a legacy character set which replaced the backslash with these signs.[4]

Subheadings

The C0 Controls and Basic Latin block contains six subheadings.[5]

C0 controls

The C0 Controls, referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the ISO/IEC 6429:1992 standard.[5]

ASCII punctuation and symbols

This subheading refers to standard punctuation characters, simple mathematical operators, and symbols like the dollar sign, percent, ampersand, underscore, and pipe.[5]

ASCII digits

The ASCII Digits subheading contains the standard European number characters 1–9 and 0.[5]

Uppercase Latin alphabet

The Uppercase Latin alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the majuscule.[5]

Lowercase Latin alphabet

The Lowercase Latin Alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the minuscule.[5]

Control character

The Control Character subheading contains the "Delete" character.[5]

Number of symbols, letters and control codes

The table below shows the number of letters, symbols and control codes in each of the subheadings in the C0 Controls and Basic Latin block.

Type of subheadingNumber of symbolsRange of characters
C0 controls32 control codesU+0000 to U+001F
ASCII punctuation and symbols33 punctuation marks and symbolsU+0020 to U+002F,U+003A to U+0040,U+005B to U+0060 and U+007B to U+007E
ASCII digits10 digitsU+0030 to U+0039
Uppercase Latin Alphabet26 unaccented Latin letters in the majuscule.U+0041 to U+005A
Lowercase Latin Alphabet26 unaccented Latin letters in the minuscule.U+0061 to U+007A
Control character1 control code containing the "Delete" character.U+007F

Block

C0 Controls and Basic Latin[1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+000x  NUL   SOH   STX   ETX   EOT   ENQ   ACK   BEL    BS     HT     LF     VT     FF     CR     SO     SI  
U+001x  DLE   DC1   DC2   DC3   DC4   NAK   SYN   ETB   CAN    EM    SUB   ESC    FS     GS     RS     US  
U+002x   SP   ! " # $ % & ' ( ) * + , - . /
U+003x 0 1 2 3 4 5 6 7 8 9  : ; < = > ?
U+004x @ A B C D E F G H I J K L M N O
U+005x P Q R S T U V W X Y Z [ \ ] ^ _
U+006x ` a b c d e f g h i j k l m n o
U+007x p q r s t u v w x y z { | } ~  DEL 
Notes
1.^ As of Unicode version 13.0

Variants

Several of the characters are defined to render as a standardized variant if followed by variant indicators.

A variant is defined for a zero with a short diagonal stroke: U+0030 DIGIT ZERO, U+FE00 VS1 (0).[6][7]

Twelve characters (#, *, and the digits) can be followed by U+FE0E VS15 or U+FE0F VS16 to create emoji variants.[8][9][10][11] They are keycap base characters, for example #️⃣ (U+0023 NUMBER SIGN U+FE0F VS16 U+20E3 COMBINING ENCLOSING KEYCAP). The VS15 version is "text presentation" while the VS16 version is "emoji-style".[7]

Emoji variation sequences
U+0023002A0030003100320033003400350036003700380039
base#*0123456789
base+VS15+keycap#*0123456789
base+VS16+keycap#*0123456789

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Basic Latin block:

VersionFinal code points[lower-alpha 1]CountUTC IDL2 IDWG2 IDDocument
1.0.0U+0000..007F128(to be determined)
UTC/1999-013Karlsson, Kent (1999-05-27), Tildes and micro sign decompositions
L2/99-176RMoore, Lisa (1999-11-04), "Micro Sign Case Mappings", Minutes from the joint UTC/L2 meeting in Seattle, June 8-10, 1999
L2/04-145Starner, David (2004-04-30), C with stroke character examples from BAE report 1884 (Dorsey)
L2/04-202Anderson, Deborah (2004-06-07), Slashed C Feedback
N3046Suignard, Michel (2006-02-22), Improving formal definition for control characters
N3103 (pdf, doc)Umamaheswaran, V. S. (2006-08-25), "M48.33", Unconfirmed minutes of WG 2 meeting 48, Mountain View, CA, USA; 2006-04-24/27
L2/11-043Freytag, Asmus; Karlsson, Kent (2011-02-02), Proposal to correct mistakes and inconsistencies in certain property assignments for super and subscripted letters
L2/11-160PRI #181 Changing General Category of Twelve Characters, 2011-05-02
L2/11-261R2Moore, Lisa (2011-08-16), "Consensus 128-C3", UTC #128 / L2 #225 Minutes, Accept Ken Whistler's recommendations in L2/11-281 on name aliases for control characters with the addition of the abbreviations BEL and NUL.
L2/11-438[lower-alpha 2][lower-alpha 3]N4182Edberg, Peter (2011-12-22), Emoji Variation Sequences (Revision of L2/11-429)
L2/15-107Moore, Lisa (2015-05-12), "Consensus 143-C5", UTC #143 Minutes, Add the 12 keycap sequences in emoji-data.txt as provisional named sequences in Unicode 8.0.
L2/15-268Beeton, Barbara; Freytag, Asmus; Iancu, Laurențiu; Sargent, Murray (2015-10-30), Proposal to Represent the Slashed Zero Variant of Empty Set
L2/15-301[lower-alpha 4][lower-alpha 3]Pournader, Roozbeh (2015-11-01), A proposal for 278 standardized variation sequences for emoji
L2/15-254Moore, Lisa (2015-11-16), "B.12.1.2 Proposal to Represent the Slashed Zero Variant of Empty Set", UTC #145 Minutes
L2/17-294N4914Lunde, Ken (2017-08-14), Proposal to add standardized variation sequence for U+FF10 FULLWIDTH DIGIT ZERO
  1. Proposed code points and characters names may differ from final code points and names
  2. See also L2/10-458, L2/11-414, L2/11-415, and L2/11-429
  3. Refer to the history section of the Miscellaneous Symbols and Pictographs block for additional emoji-related documents
  4. See also L2/15-198 and L2/15-275

See also

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.
  3. The Unicode Standard Version 1.0, Volume 1. Addison-Wesley Publishing Company, Inc. 1990. ISBN 0-201-56788-1.
  4. Sorting it all Out : When is a backslash not a backslash?
  5. "Unicode 6.2 code charts" (PDF). The Unicode Standard. Retrieved 1 April 2013.
  6. Beeton, Barbara; Freytag, Asmus; Iancu, Laurențiu; Sargent, Murray (2015-10-30). "L2/15-268: Proposal to Represent the Slashed Zero Variant of Empty Set" (PDF).
  7. "UTS #51 Emoji Variation Sequences". The Unicode Consortium.
  8. Edberg, Peter (2011-12-22). "L2/11-438: Emoji Variation Sequences (Revision of L2/11-429)" (PDF).
  9. Pournader, Roozbeh (2015-11-01). "L2/15-301: A proposal for 278 standardized variation sequences for emoji" (PDF).
  10. "UTR #51: Unicode Emoji". Unicode Consortium. 2020-02-11.
  11. "UCD: Emoji Data for UTR #51". Unicode Consortium. 2020-01-28.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.