ArmSCII

ArmSCII

ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166-9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard.

However, these encodings are not widely used because the standard was published one year after the publication of international standard ISO 10585 that defined another 7-bit encoding, from which the encoding and mapping to the UCS (Universal Coded Character Set (ISO/IEC 10646) and Unicode standards) were also derived a few years after, and there was a lack of support in the computer industry for adding ArmSCII.

Encodings defined in the ArmSCII standard

Very few systems support these encodings. Microsoft Windows does not support them, for example. It is usually better to use Unicode for proper interchange of Armenian text for web browsers and email, since most modern computers do not support ArmSCII by default.

The following three main variants are defined:

  • ArmSCII-7 defined in AST 34.005 is a 7-bit encoding, not containing Latin characters.
  • ArmSCII-8 defined in AST 34.002 is an 8-bit encoding and a superset of ASCII.
  • ArmSCII-8A defined in AST 34.002 is an alternate 8-bit encoding and also a superset of ASCII.

Note that each ArmSCII encoding also has several minor variants, depending on the revision of the related Armenian standard (which was not made official before 1997, and was defined informally before that; this has caused various confusions and the mappings described below are just best practices according to the latest 1997 revision of the Armenian standard); that may change the exact mapping and usage of a few punctuation characters and symbols.

None of the ArmSCII encodings have reached international approval (unlike the ISO 10585 standard, despite of the critics sent by the official Armenian standard body to ISO/DIS JTC 1/SC 2/WG 2, working on single byte-coded character sets) because all international efforts have been made since then to work with the UCS (in Unicode and ISO 10646).

ArmSCII-8 is intended for use on Unix and Windows systems, and for information interchange on the WWW and by email. However, Microsoft wanted users to use Unicode and not introduce a plethora of new code pages, so it is not supported natively on Windows. It just consists in remapping ArmSCII-7 in the higher range above the standard US ASCII range.

ArmSCII-8A is intended for use on DOS and Mac systems. It is a rearrangement of ArmSCII-8, to work with existing DOS and Mac code that reserve a range of code values for characters not intended for text but for presentation layout, using modified fonts; it is, however, considered as a "hack" of the code pages over which it is applied, as neither DOS (nor Windows in the "OEM" compatibility codepages used by the text-only console) nor MacOS has ever supported this encoding natively, notably in their file system (but this is also true for the now deprecated ISO 10585 standard). However, this encoding cannot map all the punctuation characters normally needed for Armenian, so the missing characters must be approximated using fallbacks to ASCII punctuation (some Armenian fonts may display these ASCII punctuation using the rendering intended for the Armenian characters that are mapped to them by these fallbacks).

ArmSCII-7

AST 34.005:1997 (ArmSCII-7)
7-bit coded character set for Armenian
  x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0x unused
1x
2x SP ֎ և / § ։)(»«·՝,֊՜
3x ՛՞ԱաԲբԳգԴդԵեԶզԷէ
4x ԸըԹթԺժԻիԼլԽխԾծԿկ
5x ՀհՁձՂղՃճՄմՅյՆնՇշ
6x ՈոՉչՊպՋջՌռՍսՎվՏտ
7x ՐրՑցՒւՓփՔքՕօՖֆ՚  

In this table, code value 21 is the eternity sign, which has, since 2013, a designated point in Unicode U+058E (LEFT-FACING ARMENIAN ETERNITY SIGN) and another for its right-facing variant: U+058D (RIGHT-FACING ARMENIAN ETERNITY SIGN).[1] Some mappings incorrectly claim that it has a code point of U+0530.

Code value 20 is the regular SPACE character; code values 00–1F and 7F are not assigned to characters by AST 34.005, though they may be the same as the ASCII control characters that are located in those positions.

Code value 22 is used to encode the Armenian ligature ew (և).[2] In some variants, it encodes the section sign (§) instead. It is strongly suggested to encode this ligature with the normal Armenian ech (yech) and yiwn (vyun) small letters pair, as various software or fonts will render it differently depending on the version of ArmSCII-7 they are assuming, and so let the renderer generate the ligature.

Code value 7F may be used sometimes as a substitution for the non-breaking space.

Note that the characters encoded at code values 2D and 7E (Armenian hyphen and apostrophe) may not be visible with all fonts supporting Armenian.

This table is simply remapped to higher codes by simple offset in ArmSCII-8 (below).

ArmSCII-8

AST 34.002:1997 (ArmSCII-8)
8-bit coded character set for Armenian
  x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0x unused
1x
2x SP !"# $%&' ()*+ ,-./
3x 01234567 89:;<=>?
4x @ABCDEFG HIJKLMNO
5x PQRSTUVW XYZ[\]^_
6x `abcdefg hijklmno
7x pqrstuvw xyz{|}~  
8x unused
9x
Ax NB
SP
֎ և / § ։ ) ( »«·՝ , ֊՜
Bx ՛՞ԱաԲբԳգԴդԵեԶզԷէ
Cx ԸըԹթԺժԻիԼլԽխԾծԿկ
Dx ՀհՁձՂղՃճՄմՅյՆնՇշ
Ex ՈոՉչՊպՋջՌռՍսՎվՏտ
Fx ՐրՑցՒւՓփՔքՕօՖֆ՚  

In this table, code value 20 is reserved for the regular SPACE character, code value A0 is reserved for the non-breaking space, and code value A1 is assigned to the eternity sign, which has, since 2013, a designated point in Unicode U+58E (LEFT-FACING ARMENIAN ETERNITY SIGN) and another for its right-facing variant: U+58D (RIGHT-FACING ARMENIAN ETERNITY SIGN).[1] Some mappings incorrectly claim that it has a code point of U+0530.

Code values 00–1F and 7F–9F are not assigned to characters by AST 34.002, though they may be the same as the ISO-8859-1 control characters that are located in those positions.

The code value A2 is used to encode the Armenian ligature ew (և).[2] In some variants it encodes the section sign (§) instead. Some Armenian fonts display this ligature at the position of the ASCII ampersand symbol, but it is strongly suggested to encode the ligature using the two standard Armenian small letters that compose it.

The code value FF may be filled with the Armenian small letter modifier apostrophe (but it has no mapping in Unicode, and shown here using the ASCII apostrophe instead, for correct rendering with Unicode fonts, it is suggested that the small letter modifier be represented using code value FE with ligature control to change its position because it only occurs after a small Armenian letter), and the Armenian apostrophe encoded at FE occurs only after a capital Armenian letter. So most implementations do not encode anything at code value FF.

This standard is the only one that makes an apparent distinction for the "mirrored" Armenian parentheses, because it was created by simply remapping the ArmSCII-7 standard. However, many documents will not consider this as a productive distinction, and the usual ASCII-based parenthesis punctuation are most commonly used instead of the ArmSCII-7-based mirrored parentheses, just because Armenian keyboards and editors using ArmSCII-8 generated the lower ASCII codes (whose usage is just swapped in classical Armenian). Also, the duplication of the ASCII comma at code value AB is also the result of the simple remapping of ArmSCII-7, so there is no difference with the ASCII comma that most ArmSCII-8 documents are using.

Note that the characters encoded at code values AD and FE (Armenian hyphen and apostrophe) may not be visible with all fonts supporting Armenian.

ArmSCII-8A

AST 34.001:1997 (ArmSCII-8A)
8-bit coded character set for Armenian
  x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0x unused
1x
2x SP !"# $%&' ()*+ ,-./
3x 01234567 89:;<=>?
4x @ABCDEFG HIJKLMNO
5x PQRSTUVW XYZ[\]^_
6x `abcdefg hijklmno
7x pqrstuvw xyz{|}~  
8x ԱաԲբԳգԴդ ԵեԶզԷէԸը
9x ԹթԺժԻիԼլ ԽխԾծԿկՀհ
Ax ՁձՂղՃճՄմ ՅյՆնՇշ«»
Bx
unused
Cx
Dx   ֎֊՞
Ex ՈոՉչՊպՋջ ՌռՍսՎվՏտ
Fx ՐրՑցՒւՓփ ՔքՕօՖֆ՚ NB
SP

In this table, code value 20 is the regular SPACE character, and code value DC is the eternity sign, which has, since 2013, a designated point in Unicode U+58E (LEFT-FACING ARMENIAN ETERNITY SIGN) and another for its right-facing variant: U+58D (RIGHT-FACING ARMENIAN ETERNITY SIGN).[1] Some mappings incorrectly claim that it has a code point of U+0530.

Code values 00–1F, 7F, and B0–DB are not assigned to characters by AST 34.002, though they may be the same as those used in a legacy DOS/OEM codepage 437 (box drawing characters) or Macintosh Roman.

Note that the characters encoded at code values DD and FE (Armenian hyphen and apostrophe) may not be visible with all fonts supporting Armenian.

Support for the Armenian script in other standards

ISO 10585:1996

ISO 10585:1996
7-bit coded character set for Armenian
 x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF
0x unused
1x
2x SP ԱԲԳԴԵԶԷ ԸԹԺԻԼԽԾԿ
3x ՀՁՂՃՄՅՆՇ ՈՉՊՋՌՍՎՏ
4x ՐՑՒՓՔՕՖ   ՝՚֊   ։,՞՟
5x   աբգդեզէ ըթժիլխծկ
6x հձղճմյնշ ոչպջռսվտ
7x րցւփքօֆ     ·՛՜  

For comparison, this is the 7-bit encoding in the international standard ISO/IEC 10585 standard that was used before the revision in the Armenian standard AST34.002:1997 (ArmSCII-8).

In this standard (as well as in ISO/IEC 10646 and Unicode), there's only one Armenian apostrophe modifier letter encoded at 0x49 when Armenian uses two modifier letter apostrophes which are cased (U+055A represents the capital apostrophe but is not considered dual-cased in Unicode and this ISO 15985 standard, the small letter apostrophe is absent but generally represented by the ASCII apostrophe U+0027 in Unicode documents).

The left half-ring punctuation (a modifier letter) and the eternity symbol are also missing, and only one double quotation mark (U+2033) is encoded in code value 7A instead of double guillemets in the three ArmSCII variants.

However, this standard maps the Armenian full stop (whose glyph looks very close to the ASCII colon) in code value 4C and the Armenian abbreviation mark (that looks very similar to an angular grave accent) in code value 4F, that are both missing from all ArmSCII code charts.

Note that the characters encoded at code values 49 and 4A (Armenian apostrophe and hyphen) may not be visible with all fonts supporting Armenian.

ISO/IEC 10646-1 and Unicode

Armenian[1][2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+053x Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ
U+054x Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ
U+055x Ր Ց Ւ Փ Ք Օ Ֆ ՙ ՚ ՛ ՜ ՝ ՞ ՟
U+056x ՠ ա բ գ դ ե զ է ը թ ժ ի լ խ ծ կ
U+057x հ ձ ղ ճ մ յ ն շ ո չ պ ջ ռ ս վ տ
U+058x ր ց ւ փ ք օ ֆ և ֈ ։ ֊ ֍ ֎ ֏
Notes
1.^ As of Unicode version 11.0
2.^ Grey areas indicate non-assigned code points
Armenian subset of Alphabetic Presentation Forms[1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+FB1x (U+FB00FB12, U+FB18FB4F omitted)
Notes
1.^ As of Unicode version 11.0

For comparison, this is the Unicode code points charts for Armenian.

Its encoding since Unicode 1.1 (except the Armenian hyphen U+058A, the last character added since Unicode 3.0) was based on the previous ISO 10585 7-bit international encoding standard, rather than on ArmsCII that was missing a dozen of characters present in ISO 10585. However, non-letters were reorganized by type, and some extensions have been added for rare Armenian characters that were missing in all past 7-bit and 8-bit standards.

Capital letters are encoded in the first half of the block (terminated by modifier letters).

Lowercase letters are encoded in the second half of the block (terminated by Armenian punctuation signs).

Unlike the ArmSCII encodings, this encoding is stable and portable across systems, and contain all characters needed for Armenian (with the exception of the Armenian eternity sign). Some Unicode-encoded fonts for Armenian are mapping the eternity sign at code point U+0530. This is incorrect, as that code point has been allocated in 2013 at U+58E, and another for its right-facing variant: U+58D.[1]

However, no distinction is kept for the Armenian (mirrored) parenthesis, so the standard ASCII/Unicode punctuation must be used according to their usual rendering. The left half-ring mark (modifier letter) is encoded here, and some other marks are unified with other scripts (notably the quotation marks, middle dot and dashes).

Note that the characters encoded at code points U+055A and U+058A (Armenian apostrophe and hyphen, like in the charts for ArmsCII and ISO 10585), and as well as U+0559 (the modifier mark for numeric, added specifically into ISO 10646-1 and Unicode), may not be visible with all fonts supporting Armenian.

Code mappings and classification

Note that some transcodings are shown below between parentheses. They are only approximation fallbacks but do not map exactly the intended character.

Subset Character Armenian description or usage Short name Encodings Notes
ArmSCII-7ArmSCII-8ArmSCII-8AISO 10585Unicode ISO/IEC 10646
General purpose spacespace202020200020same as ASCII and Unicode
non-breaking spacenbsp(20)A0FF(20)00A0missing in ArmSCII-7 and ISO 10585
Armenian symbols ֎eternity signarmeternity21A1DC 058Eright-facing variant at U+058D
ևligature ech yiwn (ew)armew(3B,75)(26) (or BB,F5)(26) (or 89,F5)(55,72)0587 (or 0565,0582)specific to Armenian : compatibility ligature of Armenian ech (yech) and yiwn (vyun) small letters, used as a symbol (similar to ampersand symbol in ASCII)
§section signarmsection22A2  00A7from ISO 8859; missing in all ArmSCII variants
Armenian punctuation ։full stop (vertsaket)armfullstop23A3(3A)4C0589specific to Armenian : looks mostly like ASCII colon, but distinct usage ; missing in ArmSCII-8A (approximated by ASCII colon)
)right parenthesisarmparenright24A429(79)0029from ASCII, name and usage different and Unicode ; missing in ISO 10585 (suggested substitution uses dashes)
(left parenthesisarmparenleft25A528(79)0028from ASCII, name and usage different and Unicode ; missing in ISO 10585 (suggested substitution uses dashes)
»right quotation markarmquotright26A6AF(7A)00BBfrom ISO-8859, name and usage different and Unicode
«left quotation markarmquotleft27A7AE(7A)00ABfrom ISO-8859, name and usage different and Unicode
quotation mark  (22)(22)7A2033used for either left or right quotation mark in ISO 10585; missing in ArmSCII-8/8A (approximated by ASCII double quotation mark)
em-dasharmemdash28A8(5F)782015from ISO-8859; missing in ArmSCII-8A (approximated by ASCII underscore)
.middle dot (mijaket)armdot29A9(2E)7C2024sometimes similar to ASCII full stop, but usage different in Armenian where the middle dot is preferred; missing in ArmSCII-8A (approximated by ASCII full stop)
՝separation mark (but)armsep2AAA(60)48055Dusage specific to Armenian : used as a comma ; = bowt ; missing in ArmSCII-8A (approximated by ASCII backquote)
,commaarmcomma2BAB2C4D002Csame as ASCII and Unicode comma
dasharmendash2CAC(2D)792010similar to the short variant of the ASCII and Unicode minus-hyphen (shorter than the general purpose minus sign used in ASCII) ; missing in ArmSCII-8A (approximated by ASCII minus-hyphen)
Armenian modifier letters ֊hyphen (yentamna)armyentamna2DADDD4A058Aspecific to Armenian : a modifier letter that modifies another Armenian normal letter (possibly with combining punctuation between them)
ellipsisarmellipsis2EAEDE(7C,7C,7C)2026from ISO-8859, but not a punctuation : a modifier letter that follows and modifies another normal Armenian letter (possibly with combining punctuation between them)
ՙnumeric mark (left half-ring)armnum    0559specific to Armenian : a modifier letter that modifies another Armenian normal letter (possibly with combining punctuation between them) ; missing in all ArmSCII variants
՚apostrophe (right half-ring)armapostrophe7EFEFE49055Aspecific to Armenian : a modifier letter that modifies another Armenian normal letter (possibly with combining punctuation between them)
Armenian combining punctuation ՜exclamation mark (amanak)armexclam2FAF(7E)7E055Cspecific to Armenian : these diacritics encode punctuation but may appear on top of a letter in the middle of any word (it may be ignored in searches); Unicode handles them as modifier letters. However, they are normally not spacing ; = batsaganchakan nshan ; missing in ArmSCII-8A (approximated by ASCII tilde symbol)
՛emphasis mark (shesht)armaccent30B0(27)7D055Bspecific to Armenian : these diacritics encode punctuation but may appear on top of a letter in the middle of any word (it may be ignored in searches); Unicode handles them as modifier letters. However, they are normally not spacing ; missing in ArmSCII-8A (approximated by ASCII single quote)
՞question mark (paruyk)armquestion31B1DF4E055Especific to Armenian : these diacritics encode punctuation but may appear on top of a letter in the middle of any word (it may be ignored in searches); Unicode handles them as modifier letters. However, they are normally not spacing ; = hartsakan nshan
՟abbreviation mark (patiw)armabbrev   4F055Fspecific to Armenian : these diacritics encode punctuation but may appear on top of a letter in the middle of any word (it may be ignored in searches); Unicode handles them as modifier letters. However, they are normally not spacing
Armenian capital letters ԱAybArmayb32B280210531
ԲBenArmben34B482220532
ԳGimArmgim36B684230533
ԴDaArmda38B886240534
ԵEch (Yech)Armyech3ABA88250535
ԶZaArmza3CBC8A260536
ԷEh (E)Arme3EBE8C270537
ԸEt (At)Armat40C08E280538
ԹToArmto42C290290539
ԺZheArmzhe44C4922A053A
ԻIniArmini46C6942B053B
ԼLiwn (Lyun)Armlyun48C8962C053C
ԽXeh (Khe)Armkhe4ACA982D053D
ԾCa (Tsa)Armtsa4CCC9A2E053E
ԿKenArmken4ECE9C2F053F
ՀHoArmho50D09E300540
ՁJa (Dza)Armdza52D2A0310541
ՂGhad (Ghat)Armghat54D4A2320542
ՃCheh (Tche)Armtche56D6A4330543
ՄMenArmmen58D8A6340544
ՅYi (Hi)Armhi5ADAA8350545
ՆNow (Nu)Armnu5CDCAA360546
ՇShaArmsha5EDEAC370547
ՈVoArmvo60E0E0380548
ՉChaArmcha62E2E2390549
ՊPeh (Pe)Armpe64E4E43A054A
ՋJheh (Je)Armje66E6E63B054B
ՌRaArmra68E8E83C054C
ՍSeh (Se)Armse6AEAEA3D054D
ՎVew (Vev)Armvev6CECEC3E054E
ՏTiwn (Tyun)Armtyun6EEEEE3F054F
ՐReh (Re)Armre70F0F0400550
ՑCo (Tso)Armtso72F2F2410551
ՒYiwn (Vyun)Armvyun74F4F4420552
ՓPiwr (Pyur)Armpyur76F6F6430553
ՔKeh (Ke)Armke78F8F8440554
ՕOh (O)Armo7AFAFA450555
ՖFeh (Fe)Armfe7CFCFC460556
Armenian small letters աaybarmayb33B381510561
բbenarmben35B583520562
գgimarmgim37B785530563
դdaarmda39B987540564
եech (yech)armyech3BBB89550565
զzaarmza3DBD8B560566
էeh (e)arme3FBF8D570567
ըet (at)armat41C18F580568
թtoarmto43C391590569
ժzhearmzhe45C5935A056A
իiniarmini47C7955B056B
լliwn (lyun)armlyun49C9975C056C
խxeh (khe)armkhe4BCB995D056D
ծca (tsa)armtsa4DCD9B5E056E
կkenarmken4FCF9D5F056F
հhoarmho51D19F600570
ձja (dza)armdza53D3A1610571
ղghad (ghat)armghat55D5A3620572
ճcheh (tche)armtche57D7A5630573
մmenarmmen59D9A7640574
յyi (hi)armhi5BDBA9650575
նnow (nu)armnu5DDDAB660576
շshaarmsha5FDFAD670577
ոvoarmvo61E1E1680578
չchaarmcha63E3E3690579
պpeh (pe)armpe65E5E56A057A
ջjheh (je)armje67E7E76B057B
ռraarmra69E9E96C057C
սshe (se)armse6BEBEB6D057D
վvew (vev)armvev6DEDED6E057E
տtiwn (tyun)armtyun6FEFEF6F057F
րreh (re)armre71F1F1700580
ցco (tso)armtso73F3F3710581
ւyiwn (vyun)armvyun75F5F5720582
փpiwr (pyur)armpyur77F7F7730583
քkeh (ke)armke79F9F9740584
օoh (o)armo7BFBFB750585
ֆfeh (fe)armfe7DFDFD760586

See also

References

  1. 1 2 3 4 "ISO/IEC 10646:2012/Amd.1: 2013 (E)" (PDF).
  2. 1 2 "ՀՍՏ 34.002—98" (PDF). Retrieved 18 July 2010.

Further reading

  • [ArmSCII] Armenian Standard Code for Information Interchange—Center of Humane Technologies "Armenian Computer", June 1991.
  • [AST 34.001-97] Information Technologies—Character Set And Information Encoding: Character Set—State Standardization Committee of the Republic of Armenia, July 1997.
  • [ArmSCII Version 2] Armenian Standard Code for Information Interchange, Version 2—ArmSCII Working Group, May 1999.
  • https://www.math.nmsu.edu/~mleisher/Software/csets/ARMSCII-7.TXT ARMSCII-7.TXT Armenian Standard Code for Information Interchange 1999, 7-bit encoding for transmission (2000-11-13)
  • https://www.math.nmsu.edu/~mleisher/Software/csets/ARMSCII-8.TXT ARMSCII-8.TXT Armenian Standard Code for Information Interchange 1999, 8-bit encoding for Windows and Unix. (2000-11-13)
  • https://www.math.nmsu.edu/~mleisher/Software/csets/ARMSCII-8A.TXT ARMSCII-8A.TXT Armenian Standard Code for Information Interchange 1999, alternative 8-bit encoding for DOS and Macintosh. (2000-11-13)
  • https://www.math.nmsu.edu/~mleisher/Software/csets/AST166-7.TXT AST166-7.TXT Armenian national standard AST166.1997, 7-bit encoding for transmission. (superseded by ARMSCII-7)
  • https://www.math.nmsu.edu/~mleisher/Software/csets/AST166-8.TXT AST166-8.TXT Armenian national standard AST166.1997, 8-bit encoding for Windows and Unix. (superseded by ARMSCII-8)
  • https://www.math.nmsu.edu/~mleisher/Software/csets/AST166-A.TXT AST166-A.TXT Armenian national standard AST166.1997, "A" encoding for DOS and MacOS. (superseded by ARMSCII-8A)
  • Savard, John J. G. (2018) [2005]. "Computer Arithmetic". quadibloc. The Early Days of Hexadecimal. Archived from the original on 2018-07-16. Retrieved 2018-07-16. (NB. Has info on ARMSCII.)
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.