Urdu alphabet

The Urdu alphabet (Urdu: اردو‌ حروف‌تہجی simplified script: اردو حروف تہجی Urdu harūf tahajī, or اردو‌تہجی Urdu tahajī) is the right-to-left alphabet used for the Urdu language. It is a modification of the Persian alphabet, which is itself a derivative of the Arabic alphabet. The Urdu alphabet has 39[1] or 40 letters[2] plus digraphs. The Urdu alphabet has no distinct letter cases, is typically written in the calligraphic Nastaliq script, whereas Arabic is more commonly in the Naskh style.

Urdu alphabet

اردو‌ حروف‌تہجی

Urdu harūf tahajī

اردو‌تہجی

Urdu tahajī
Example of writing in the Urdu alphabet: Urdu
Type
Abjad
LanguagesUrdu, Balti, Burushaski, others
Parent systems
Proto-Sinaitic
Unicode range
U+0600 to U+06FF

U+0750 to U+077F
U+FB50 to U+FDFF

U+FE70 to U+FEFF
The Urdu alphabet, depicted with character names.

Usually, bare transliterations of Urdu into Roman letters (called Roman Urdu) omit many phonemic elements that have no equivalent in English or other languages commonly written in the Latin script. The National Language Authority of Pakistan has developed a number of systems with specific notations to signify non-English sounds, but these can only be properly read by someone already familiar with the loan letters.

The standard Urdu script is a modified version of the Perso-Arabic script and has its origins in 13th century Iran. It is closely related to the development of the Nastaʻliq style of Perso-Arabic script. Urdu script in its extended form is known as Shahmukhi script and is used for writing other Indo-Aryan languages of North Indian subcontinent like Punjabi and Saraiki as well.

Urdu and Hindi are mutually intelligible as spoken languages, or when written in the Latin alphabet. The most obvious distinction between Hindi and Urdu is the script. Both scripts have religious connotations.

Alphabet

The Urdu script is an abjad script derived from Perso-Arabic script, which is itself a derivative of the Arabic script. The Urdu alphabet was standardized in 2004 by the National Language Authority, which is responsible for standardizing Urdu in Pakistan. According to the National Language Authority, Urdu has 58[1] letters of which 39[1] are basic letters while 18[1] are digraphs to represent aspirated consonants made by attaching basic consonant letters with a variant of He called do chashmi he.[1][3][4][2] Tāʼ marbūṭah is also sometimes considered a letter though it is rarely used except for in certain loan words from Arabic.

As an abjad, the Urdu script only shows consonants and long vowels; short vowels can only be inferred by the consonants' relation to each other. While this type of script is convenient in Semitic languages like Arabic and Hebrew, whose consonant roots are the key of the sentence, Urdu is an Indo-European language, which does not have the same luxury, hence necessitating more memorization. Urdu uses the vowels represented as full letters ا و ی ے more often than Arabic; there are fewer short vowels to omit. Also, Hamza ئ and the mada on Alif Mada آ are not omitted. Words in Urdu that differ only by ommitted short vowels are rarer in Urdu than Arabic, but the meanings are often far more divergent than Arabic words with the same root.

Letter names and phonemes

The number of letters in the Urdu alphabet is more ambiguous than the 26 in the English alphabet, the most commonly quoted numbers are 39 and 40. The usual letter forms in Urdu Nastaliq are somewhat more diverse than for most Arabic-derived alphabets [see below].

Letter names and phonemes table:
number in alphabet Isolated
Letter
Form [footnote]
Sound Name [5][6] [glossary of key words below] Unicode [0]
[6] ALA‑LC [7] Hunterian [8] IPA closest sound in English Nastaliq
[footnote 1]
full diacritics [6] Romanizations [6][1]
1 1 ا /ɑː/ /ā/
/ʔ/ /–/

الف الِف alif [6] / alef [9] U+0627 ..
[footnote] 1a آ الف مدہ الِف مَدّه alif maddah [6]
alef with madda above [9]
U+0622 ..
2 2 ب b /b/ بے بے [6] / beh [9] ..
3 3 پ p /p/ پے پے [6] / peh [10] ..
4 4 ت t /t̪/ تے تے [6] / teh [9] ..
5 5 ٹ t /ʈ/ ٹے ٹے ṭē [6] / tteh [10] ..
6 6 ث s /s/ ثے ثے [6] / s̱ē ..
7 7 ج j /d͡ʒ/ جيم جِيم jīm [6] / jeem [9] ..
8 8 چ c ch /t͡ʃ/ چے چے čē [6] / [6] / tcheh [10] ..
9 9 ح h /ɦ/ بڑی حے بَڑی حے baṛī ħē [6] / baṛī ḥē   ..
10 10 خ k͟h kh /x/ خے خے [6] / khē [6]
khah [9] / k͟hē  
..
11 11 د d /d/ دال دال dāl [6] / dal [9]   ..
12 12 ڈ d /ɖ/ ڈال ڈال ḍāl [6] / ddal [10] ..
13 13 ذ z /z/ ذال ذال zāl [6] / ẕāl ..
14 14 ر r /r/ رے رے [6] / reh [9]   ..
15 15 ڑ r /ɽ/ ڑے ڑے ṛē [6] / rreh [10]   ..
16 16 ز z /z/ زے زے [6] / zey   ..
17 17 ژ zh zh /ʒ/ ژے ژے žē [6] / zhē [6]   ..
18 18 س s /s/ سین سِين sīn [6] / seen [9]   ..
19 19 ش sh sh /ʃ/ شین شِين  šīn [6] / shīn [6] / sheen [9]   ..
20 20 ص s /s/ صاد صْواد  swād [6] / sad [9] / ṣwād   ..
21 21 ض z /z/ ضاد ضْواد żwād [6] / ẓwād   ..
22 22 ط t /t/ طوے طوئے tō’ē [6] / tah [9]
tōē / t̤o'ē / toy
..
23 23 ظ
[footnote]
z /z/ ظوے ظوئے zō’ē [6] / zōē / z̤o'ē   ..
24 24 ع ʻ / ‘ / ` / ' / ’ / ʼ
[footnote 3]
/ɑː/ /oː/ /eː/
/ʔ/ /ʕ/ /∅/
عین عَيْن ‘ain [6] / ain [9] ayn   ..
25 25 غ g͟h gh /ɣ/ غین غَيْن  ğain [6] / ghain [9] / g͟hain ..
26 26 ف f /f/ فے فے [6] / feh [9] ..
27 27 ق q /q/ قاف قاف qāf [6] / qaf [9] ..
28 28 ک k /k/ کاف کاف kāf [6] / kāf   ..
29 29 گ g /ɡ/ گاف گاف gāf [6] / gaf [10]   ..
30
[footnote]
30 ل l /l/ لام لام lām [6] / lam [9]   ..
31 31 م m /m/ میم مِيم mīm [6] / meem [9]   ..
32 32 ن n /n/ /ɲ/
/ɳ/ /ŋ/
نون نُون nūn [6]noon [9]   ..
33 32a ں ٘ n / ◌̃ / نون غنہ نُونِ غُنّہ nūn-e ğunnah [6]
noon ghunna [10]
nūn g͟hunnah
..
34 33 و v or
ū/u/o/au
w or
ū/u/o/au
/ʋ/ /ʊ/ /uː/ /oː/ /ɔː/ واؤ واؤ vāō [6] /  wāō [6]
waw [9] / wā'o
..
35 34 ہ h / ā / e /ɦ/ /ɑː/ /e:/ گول ہے گول ہے gōl hē [6]
heh goal [10]
gol hē
..
چھوٹی ہے چھوٹی ہے čhōṭī hē [6]
 choṭī hē
Arabic
[footnote]
34a ه چهوٹی هے چهوٹی هے ..
36 34b ھ h /ʰ/ or /ʱ/ دو چشمی ہے دوچَشْمی ہے dō-čašmī hē [6]
do-cashmī hē
heh doachashmee [10]
..
دو چشم ه دو چشم ه do-cashm hē ?
do-cashmi hē ?
38 35 ی y / ī / á /j/ /iː/ /ɑː/ چھوٹی يے چھوٹی يے čhōṭī yē [6]
choṭī yē  
..
39 35b ے ai / e /ɛː/ /eː/ بڑی يے بَڑی يے baṛī yē [6]
yeh barree [10]
..
Hamza ہمزہ ہَمْزه hamza [9] hamzah
37 0 & 35 ء ʼ / – / yi
[footnote 2]
/ʔ/ /∅/ ___ ___ hamza [9]
(hamza on the line)  
..
ٴ ___ ___ hamza diacritic   ..

..

ئ ___ ___  yeh with hamza above [9]
yē hamza  
..
___ ___ alif hamza
ۓ ___ ___  yeh barree with hamza above [10]
baṛī yē hamza  
..
33a ؤ واوِ مَہْمُوز واوِ مَہْمُوز [6] vāv-e mahmūz [6]
waw with hamza above [9]
..

..

ۂ ___ ___ heh goal with hamza above [10] ..
Ta marbuutah
Arabic
[footnote]
ۃ ___ ___ teh marbuta goal [10] ..
ة تاء مرظوطة تَاء مَرْظُوطَة teh marbuta [9]
Tāʼ marbūṭah  
..
Footnotes for letter names and phonemes:

^nastaliq  0. Nastaliq Urdu has more forms of some letters than are typically found in Arabic.

^ 1. This may display in a different style if you do not have a Nastaliq font installed. [endnote]

^ 2. Hamza

In Urdu, hamza ء is silent in all its forms except for when it is used as hamza-e-izafat. The main use of hamza ء in Urdu is to indicate a vowel cluster. Sometimes transliterated as "2" in informal Arabic but not in Urdu

In Urdu words, Hamza ء is always attached ئ to a form resembling the Arabic ى alef maksura. Some fonts convert an isolated Hamza in this form to Hamza on the line.

Hamza can be difficult to recognise in Urdu handwriting and fonts designed to replicate it, closely resembling two dots above as featured in ت Té and ق Qaf, whereas in Arabic and Geometric fonts it is more distinct and closely resembles the western form of the numeral 2 two.

^ 3. Ayn ع in its initial عـ and final ـع position is usually silent in pronunciation and is replaced by the sound of its preceding or succeeding vowel. When it appears in the middle of a word there are a few different, similar looking, characters used to represent it in the Latin alphabet: (`) the grave accent, (‘) the left single quotation mark, (') the apostrophe, or the Pacific (ʻ) okina, or it can be pronounced like Arabic hamza (ʼ) and be transliterated as equivalent marks in the reverse direction such as (’) the right single quotation mark. Sometimes transliterated as "3" in informal Arabic but not in Urdu.

^4. Most vowel diacritics are omitted in most Urdu writing, but Urdu writing usually does distinguish alif mad, and include hamza over bari ye, gol he, and wow. For example, alif mad and bare alif in آزادی ("āzādī", ɑ:zɑ:d̪i, freedom[11]) are distinguished in most contexts.

^Z. These excess diacritics do not reflect any significant difference in pronunciation between the letters ذ ز ض ظ all shown as "z" in other systems.

^5. Gol He and do-cashmi-he diverged from the Arabic letter he, sometimes choti hey is used too refer to gol hey, while sometimes choti he refers to the Arabic version. The distinction is somewhat artificial, since gol he is an equivalent letter to the Arabic letter, but they have separate unicode characters. Some fonts make the Arabic hé look the same as gol hey or do-cashmi hé.

^6. Tāʼ marbūṭah is also sometimes considered a letter, though it is rarely used except for in certain loan words from Arabic.

^7. Ta mahbuta is regarded as a form of tā, the Arabic version of Urdu té, But it is not pronounced as such, and when replaced with an urdu letter in naturalised loan words it is usually replaced with gol he .

^[0.] These are illustrative only, in typing and typesetting the pre-combined characters in column 2 are used. Consonant diacritics are from unicode set "Arabic pedagogical symbols"[12]. U+25CC "DOTTED CIRCLE", U+00A0 NO-BREAK SPACE, and U+0640 "Arabic Tatweel Modifier Letter" are used to show diacritic positions. Tatweel doesn't work in Nastaliq fonts.

^Shapes: Skeleton characters that do not appear in the alphabet are "DOTLESS BEH" U+066E, "DOTLESS QAF" U+066F, and "DOTLESS FEH" U+06A1. These are not used in Urdu but we're used historically in [rasm | very early] versions of Arabic writing.


Variations
The Urdu alphabet isolated forms in 3 styles
Nastaliq [footnote 1]  ے ی و ھ ہ ن ں م ل گ ک ق ف غ ع ظ ط ض ص ش س ژ ز ڑ ر ذ ڈ د خ ح چ ج ث ٹ ت پ ب آ ا ئ ء
simplified [footnote 2] ء ئ ا آ ب پ ت ٹ ث ج چ ح خ د ڈ ذ ر ڑ ز ژ س ش ص ض ط ظ ع غ ف ق ک گ ل م ن ں ہ ھ و ی ے
Naskh [footnote 3] ء ئ ا آ ب پ ت ٹ ث ج چ ح خ د ڈ ذ ر ڑ ز ژ س ش ص ض ط ظ ع غ ف ق ک گ ل م ن ں ہ ھ و ی ے
The Urdu alphabet isolated forms in 17 fonts
Noto Nastaliq Urdu
Urdu Typesetting
Scheherazade
Lateef
Noto Naskh Arabic
Markazi Text
Noto Sans Arabic
Baloo Bhaijaan
El Messiri
Lemonada
Changa
Mada
Noto Kufi Arabic
Reem Kufi
Lalezar
Jomhuria
Rakkas

Footnotes:

^Footnote 1. These styles may display in different styles, depending on which fonts you have installed on your device. Compare to the image, the first two lines of the image are in Nastaliq fonts: "Noto Nastaliq Urdu" from Google's Noto fonts collection and "Urdu Typesetting" from Microsoft.

^Footnote 2. Simplified geometric font styles are rarely used for Urdu, they are more commonly used for other languages such as Arabic and Farsi, but many of these fonts support the full Urdu alphabet, such as "Baloo Bhaijaan" (yellow font in the image) which was specifically designed for Urdu by the India-based typeface foundry Ek Type, and Microsoft's "Tahoma".

^[Footnote 3. Naskh styles are usually not a first choice for Urdu publishing, but Naskh fonts are often used where a more characteristic Urdu font is unavailable. Naskh fonts have been available much longer than Nastaliq fonts, and Naskh fonts than work better than Nastaliq fonts where display sized or processing power are limited, such as on mobile phones or older computers.

words from letter names

^[back to table of letters]

Letter(s) Word other uses
isolated
form
Urdu
name
Romanised
name
Urdu IPA transliteration          translation          Urdu
pronunciation translation
ی چھوٹی یے čhōṭī yē چھوٹی tʃʰoːʈi [13] choti small / minor /
junior [13][14]
چھوٹی آنت small intestine
ہ چھوٹی ہے čhōṭī hē [[ ]]
گول ہـے gōl hē گول goːl [15] gōl round / spherical / vague / silly / obese [15] گول گپے gol gappay panipuri
ھ دوچَشْمی ہے dō-čašmī hē دوچَشْمی do-cashmī دو چشمی دوربین binoculars
دوربین Telescope
دو 2 / two [[ ]]
چَشْمی [[ ]]
چشم /tʃəʃm/ [16] the eye / hope / expectation [16] [[ ]]
ح بَڑی حے baṛī ħē بَڑی bəɽi [17] baṛī /
bari
big / elder [17] بڑی آنت large intestine
ے بَڑی يـے baṛī yē [[ ]]
ں نُونِ غُنّہ nūn-e ğunnah غُنّہ ɣʊnnɑ [18] ğunnah / g͟hunnah nasal sound
or twang [18]
[[ ]]
آ الِف مَدّه alif maddah مَدّه maddah [[ ]]
ؤ واوِ مَہْمُوز vāv-e mahmūz مَہْمُوز mæhmuːz [19] mahmūz defective / improper [19] [[ ]]
ا آ ب پ ت ٹ ث ب ج چ خ ح د ڈ ذ ر ڑ ز ژ س ش ص ض ط ظ ع غ ف ق ک گ ل م ن ں و ہ ھ ی ے ء _____ [[ ]]
حَرْف /hərf/ [20] "letter of the alphabet" / handwriting / statement / blame / stigma [20] [[ ]]
حُرُوف /hʊruːf/ [21] letters (plural) [21] [[ ]]
_____ [[ ]]
اشْكال [[ ]]
اِسْم [[ ]]
_____ [[ ]]

Nastaliq

The Nastaliq calligraphic writing style (sometimes spelled Nastaʿlīq) began as a Persian mixture of scripts Naskh and Ta'liq. After the Mughal conquest, Nastaʻliq became the preferred writing style for Urdu. It is the dominant style in Pakistan, and many Urdu writers elsewhere in the world use it. Nastaʿlīq is more cursive and flowing than its Naskh counterpart.

Letter forms

^[back to letter names and phonemes]

For the Arabic alphabet, and many others derived from it, letters are regarded as having two or three general forms each, based on their position in the word. Though obviously Arabic calligraphy can add a great deal of complexity. But for the Nastaliq style in which Urdu is written, even in the most mundane or informal documents, uses more than three general forms for many letters.


The names of 10 Urdu letters to show the variable letter forms in context. First and fourth rows: isolated letters, second and fifth rows: the names of those letters, third and sixth rows: the letter names broken up into individual letters.



Letter construction

Nastaliq isolated forms ء   ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ک ل م د ر و ھ ہ ه isolated forms
Naskh forms isolated ء ا ے ى ں ٮ ح س ص ط ع ڡ ٯ ک ل م د ر و ھ ہ ه
start ء ا ىـ حـ سـ صـ طـ عـ ڡـ ٯـ کـ لـ مـ د ر و ھـ ہـ هـ start
mid ء ـا ـىـ  ـحـ ـسـ ـصـ ـطـ ـعـ ـڡـ ـٯـ ـکـ ـلـ ـمـ ـد ـر ـو ـھـ ـہـ ـهـ mid
end ء ـا ـے ـى ـں ـٮ ـح ـس ـص ـط ـع ـڡ ـٯ ـک ـل ـم ـد ـر ـو ـھ ـہ ـه end
uniode (i) 0621 .. 0627 .. 06D2 .. 0649 .. 06BA .. 066E .. 062D .. 0633 .. 0635 .. 0637 .. 0639 .. 06A1 .. 066F .. 066F .. 0644 .. 0645 .. 062F .. 0631 .. 0648.. 06BE .. 06C1 .. 0647 .. unicode (i)
i'jam
nūn ghūnna (ii) ٘ ــ٘ـ ٘
0658 .. diacritic nūn ghūnna 0658 ..
none ء
.. 
ا
..
ے
..
ـی ی 
06CC ..
ں
06BA ..
none ح
..
س
..
ص
..
ط 
..
ع
..
ک
..
ل
..
م
..
د 
..
ر
..
و
..
ھـ ـھـ ـھ
..
ہـ ـہـ ـہ
..
هـ ـهـ ـه  
..
ء
.. 
ا
..
ے
..
ـی ی 
06CC ..
ں
06BA ..
none ح
..
س
..
ص
..
ط 
..
ع
..
ک
..
ل
..
م
..
د 
..
ر
..
و
..
ھـ ـھـ ـھ
..
ہـ ـہـ ـہ
..
هـ ـهـ ـه  
..
none 0621 .. 0627 .. 06D2 .. 0649 .. 06BA .. 066E .. 062D .. 0633 .. 0635 .. 0637 .. 0639 .. 06A1 .. 066F .. 066F .. 0644 .. 0645 .. 062F .. 0631 .. 0648.. 06BE .. 06C1 .. 0647 ..
tōē above ٹ ڈ ڑ
.. tōē above .. --> .. --> ..
ب ج
.. one dot below .. ..
ن
..
خ
.. 
ض
..
ظ
..
غ
..
ف
.. 
ذ
..
ز
.. 
. one dot above
یـ ـیـ
.. two dots below
(in initial and middle positions only)
06CC ..
ت ق ـۃ ـة
.. two dots above .. .. .. ..
پ چ  
.. three dots below
.. pointing down
پ
..
چ
.. 
ث ش   ژ three dots above
.. three dots above .. ..   ..
line above   گ    
.. Overline   ..    
madda ۤ ۤ آ       
.. small high
.. non spacing
0622 ..       
Hamza ٴ   ــٔـ   ۓ ئ ؤ ۂ
0674 .. High Hamza
0654 .. Hamza above
  06D3 .. 0626 .. 0624 .. 06C2 .. unicode

^i. The ijam diacritic characters are illustrative only, in most typesetting the colonies characters in the middle of the table are used.

^ii. Nūn Ghūnna in the middle of a word is often an ommitted diacritic.

Letter forms table:

name Letter forms [footnote] name Shape [0] [Rasm] Unicode [0] [Rasm]
Isolated
glyph
zwj end
after
mid
between
start
before
Nastaliq
[fonts]
Naskh
end mid start ٮ ٮ ا ٮ ی ا ی base dots Letter Letter dots base
ا ‍ا ا‍ ٮا ا alif[6] alef[9] none ا ا none ا U+0627 none U+0627
آ ‍آ آ‍ ٮآ آ alif maddah[6] "alef with madda above"[9] ۤ ۤ آ U+0622 U+06E4
U+0653
ب ‍ب ‍ب‍ ب‍ ٮب ٮبا ٮبی با بی [6] beh[9] ٮ ٮ ب U+0628 U+FBB3 U+066E
پ ‍پ ‍پ‍ پ‍ ٮپ ٮپا ٮپی پا پی [6] peh[10] پ U+067E U+FBB9
ت ‍ت ‍ت‍ ت‍ ٮت ٮتا ٮتی تا تی [6] teh[9] ت U+062A U+FBB6
ٹ ‍ٹ ‍ٹ‍ ٹ‍ ٮٹ ٮٹا ٮٹی ٹا ٹی ṭē[6] tteh[10] ṭē ٹ U+0679 U+FBC0
ث ‍ث ‍ث‍ ث‍ ٮث ٮثا ٮثی ثا ثی [6] s̱ē ث U+062B U+FBB6
ج ‍ج ‍ج‍ ج‍ ٮج ٮجا ٮجی جا جی jīm [6] jeem[9] jīm ح ح ج U+062C U+FBB3 U+062D
چ ‍چ ‍چ‍ چ‍ ٮچ ٮچا ٮچی چا چی čē[6] [6] tcheh[10] چ U+0686 U+FBB9
ح ‍ح ‍ح‍ ح‍ ٮح ٮحا ٮحی حا حی baṛī ħē[6] baṛī ḥē none none ح U+062D none
خ ‍خ ‍خ‍ خ‍ ٮخ ٮخا ٮخی خا خی [6] khē[6] khah[9] k͟hē خ U+062E U+FBB2
د ‍د د‍ ٮد د dāl[6] dal[9] dāl none د د none د U+062F none U+062F
ڈ ‍ڈ ڈ‍ ٮڈ ڈ ḍāl[6] ddal[10] ḍāl ڈ U+0688 U+FBC0
ذ ‍ذ ذ‍ ٮذ ذ zāl[6] ẕāl ذ U+0630 U+FBB2
ر ‍ر ر‍ ٮر ر [6] reh[9] none ر ر none ر U+0631 none U+0631
ڑ ‍ڑ ڑ‍ ٮڑ ڑ ṛē[6] rreh[10] ṛē ڑ U+0691 U+FBC0
ز ‍ز ز‍ ٮز ز [6] ز U+0632 U+FBB2
ژ ‍ژ ژ‍ ٮژ ژ žē[6] zhē[6] zhē ژ U+0698 U+FBB6
س ‍س ‍س‍ س‍ ٮس ٮسا ٮسی سا سی sīn[6] seen[9] sīn none س س none س U+0633 none U+0633
ش ‍ش ‍ش‍ ش‍ ٮش ٮشا ٮشی شا شی šīn[6] shīn[6] sheen[9] shīn ش U+0634 U+FBB6
ص ‍ص ‍ص‍ ص‍ ٮص ٮصا ٮصی صا صی swād[6] sad[9] ṣwād none ص ص none ص U+0635 none U+0635
ض ‍ض ‍ض‍ ض‍ ٮض ٮضا ٮضی ضا ضی żwād[6] ẓwād ض U+0636 U+FBB2
ط ‍ط ‍ط‍ ط‍ ٮط ٮطا ٮطی طا طی tō’ē[6] tah[9] t̤o'ē none ط ط none ط U+0637 none U+0637
ظ ‍ظ ‍ظ‍ ظ‍ ٮظ ٮظا ٮظی ظا ظی zō’ē[6] z̤o'ē ظ U+0638 U+FBB2
ع ‍ع ‍ع‍ ع‍ ٮع ٮعا ٮعی عا عی ‘ain[6] ain[9] ‘ain none ع ع none ع U+0639 none U+0639
غ ‍غ ‍غ‍ غ‍ ٮغ ٮغا ٮغی غا غی ğain[6] ghain[9] g͟hain غ U+063A U+FBB2
ف ‍ف ‍ف‍ ف‍ ٮف ٮفا ٮفی فا فی [6] feh[9] ڡ ڡ ف U+0641 U+FBB2 U+06A1
ق ‍ق ‍ق‍ ق‍ ٮق ٮقا ٮقی قا قی qāf[6] qaf[9] qāf ٯ ٯ ق U+0642 U+FBB4 U+066F
ک ‍ک ‍ک‍ ک‍ ٮک ٮکا ٮکی کا کی kāf[6] kāf none ک ک none ک U+06A9 none U+06A9
گ ‍گ ‍گ‍ گ‍ ٮگ ٮگا ٮگی گا گی gāf[6] gaf[10] gāf گ U+06AF Overline U+203E
Lam
[footnote]
ل ‍ل ‍ل‍ ل‍ ٮل ٮلا ٮلی لا لی lām[6] lam[9] lām none ل ل none ل U+0644 none U+0644
م ‍م ‍م‍ م‍ ٮم ٮما ٮمی ما می mīm[6] meem[9] mīm none م م none م U+0645 none U+0645
ن ‍ن ‍ن‍ ن‍ ٮن ٮنا ٮنی نا نی nūn[6] noon[9] nūn ں ں ن U+0646 U+FBB2 U+06BA
ں ‍ں ٘ [footnote] ٮں ٘ never
starts
nūn-e ğunnah[6] noon ghunna[10] nūn g͟hunnah ٘ ــ٘ـ ں U+06BA U+0658
و ‍و و‍ ٮو و vāō[6] wāō[6] waw[9] wā'o none و و none و U+0648 none U+0648
ہ ‍ہ ‍ہ‍ ہ‍ ٮہ ٮہا ٮہی ہا ہی gōl hē[6] heh goal[10] gol hē ہہہ ہ ہ ہہہ ہ U+06C1

[footnote]
ه ‍ه ‍ه‍ ه‍ ٮه ٮها ٮهی ها هی čhōṭī hē[6]
choṭī hē
ههه ه ه ههه ه U+0647
do-cashmī hē ھ ‍ھ ‍ھ‍ ھ‍ ٮھ ٮھا ٮھی ھا ھی dō-čašmī hē[6] heh doachashmee[10] do-cashmī hē ھھھ ھ ھ ھھھ ھ U+06BE    
ی ‍ی ‍ی‍ ی‍ ٮی ٮیا ٮیی یا یی čhōṭī yē[6] / choṭī yē ی یــیــی ی ی U+06CC U+FBB5
ے ‍ے ے‍ ٮے ے baṛī yē none ے ے none ے U+06D2 none U+06D2
Hamzah
ء ء ء hamza[9] (hamza on the line) ء ء U+0621
ٴ ٴ ٴ (hamza diacritic) ٴ ــٔـ U+0674 U+0654
ئ ‍ئ ‍ئ‍ ئ‍ ٮئ ٮئا ٮئی ئا ئی "yeh with hamza above"[9] yē hamza ٴ
ٔ
ی ی ٴ
◌ٔ
ــٔـ
ئ
ئـ ـئـ ـئ
U+0626 YEH WITH HAMZA ABOVE High Hamza U+0674 non‑spacing Hamza above U+0654 on dotted circle U+25CC and Tatweel U+0640 U+06CC
Farsi Yeh
alif hamza alif hamza ى ى U+0649
Alef Masksura
ۓ ‍ۓ ۓ‍ ٮۓ ۓ "yeh barree with hamza above"[10] baṛī yē hamza ے ے ۓ U+06D3 U+06D2
ؤ ‍ؤ ؤ‍ ٮؤ ؤ vāv-e mahmūz[6] "waw with hamza above"[9] و و ؤ U+0624 U+0648
ٶ‎ ‍ٶ‎ ٶ‎ ٮٶ‎ ٶ‎ "high hamza waw"[10] و و ٶ‎ U+0676 U+0648
ۂ ‍ۂ ‍ۂ‍ ۂ‍ ٮۂ ٮۂا ٮۂی ۂا ۂی "heh goal with hamza above"[10] ہ ہ ۂ U+06C2 U+06C1
Ta marbuutah
Arabic
[footnote]
ۃ ‍ۃ ‍ۃ‍ ۃ ٮۃ ۃ teh marbuta goal[10] ‍ہ ـہ ـۃ U+06C3 U+FBB4 U+06C1
ة ‍ة ‍ة‍ ة ٮة ة teh marbuta[9] Tāʼ marbūṭah ‍ه ـه ـة U+0629
U+0647
ت ‍ت ‍ت‍ ت‍ ٮت ٮتا ٮتی تا تی teh[9] Tāʼ ٮ ٮ ت U+062A
(see above)
U+066E
Footnotes on letter forms:
illustration illustrated text note

نستعلیق
^ Nastaliq: This may not display as Nastaliq style, depending on which fonts you have installed on your device. [end note]
ں ◌٘ ھ ڑ ے ^ Start forms vs staring words: No Urdu word begins with ے , ھ , ڑ , or ◌٘ / ں but some of these forms appear following a non-joining letter ا و د ڈ ذ ر ڑ ز ژ in the middle of a word.
ے ^ Baṛī ye: "greater yē" (بڑی يے) is used only at the end of a word[22].
ک گ ^ Kaf and Gaf before tall letters: Simpler fonts, including early fonts developed for Arabic, usually have just two or three forms of each letter. But in Urdu's usual Nastaliq script, letters can have more than three position forms depending on which letters they are attached to. This is sometimes simplified by digital fonts - even modern Urdu Nastaliq fonts - which do not perfectly replicate the nuance of handwriting, but in the case of Gāf and Kāf it is prominent. [23][24]

لا   لا

^ Lam Alef ligature: Lam ل followed by Alif ا forms a specific ligature لا in most Arabic writing styles but this is less dramatic in Urdu script.

Vowels

The Urdu language has a total of 10 vowels: 3 short, 5 long and 2 diphthongal. Like in its parent Arabic alphabet, Urdu vowels are represented using a combination of digraphs and diacritics. Alif, Wāʾo, Ye, He and their variants are used to represent vowels.

Vowel chart

Urdu does not have standalone vowel letters as a characteristic of abjads called mater lectionis. Short vowels (a, i, u), which do not occur word-finally, are represented by optional diacritics (zabar, zer, pesh) upon the preceding consonant or a placeholder consonant (alif, ain, or hamza) if the syllable begins with the vowel, and long vowels by consonants alif, ain, ye, and wa'o, with disambiguating diacritics, some of which are optional (zabar, zer, pesh), whereas some are not (madd, hamza). This is a table of Urdu vowels:

Romanization IPA Final Middle Initial
a /ə/ N/A ـَ اَ
ā /aː/ ـَا ؛ ـَی ؛ ـَہ ـَا آ
i /ɪ/ N/A ــِـ اِ
ī /iː/ ـِى ـِيـ اِی
e /eː/ ـے‬ ـيـ اے
ai /ɛː/ ـَے‬ ـَيـ اَے
u /ʊ/ N/A ـُ اُ
ū /uː/ ـُو اُو
o /oː/ ـو او
au /ɔː/ ـَو اَو

Alif

Alif, the first letter of the Urdu alphabet, is a glottal stop consonant but is exclusively used as a vowel except in the syllable-initial position where it alone rather functions as a placeholder for syllable-initial short vowels, for example, اب ab, اسم ism, اڑ uṛ. As a vowel, it represents the long "a" (/ɑː/), for example, بھاگنا bhāgnā but when it follows another alif it takes the form of a tilde-like diacritic called madd on top of that alif, for example, آپ āp.

Waʾo

Wāʾo is used, as a consonant/semivowel, for "w" (/w/) and its allophonic development, the labiodental approximant (/ʋ/), and, as a vowel, for long "u" (/uː/), long "o" (/oː/) and the monophthongized diphthong "au" (/ɔː/). However, when preceded by a k͟he (خ), wāʾo sometimes renders the short "u" (/ʊ/), for example, in خود k͟hud.

Ye and Bari ye

Ye has a variant called baṛī ye ("greater ye") for which the regular Perso-Arabic ye (ی) is called choṭī ye ("lesser ye"), which is used, as a consonant/semivowel, for "y" (/j/) and, as a vowel, for long "i" (/iː/), long "e" (/eː/) and the monophthongized diphthong "ai" (/ɛː/).

Baṛī ye (ے) is however used to render the word-final long "e" and "ai" especially to distinguish prepositions and other single syllable words. Baṛī ye is never used as a consonant.

Letter's name Nastaliq Naskh
Final
Form
Middle
Form
Initial
Form
Isolated
Form
Final
Form
Middle
Form
Initial
Form
Isolated
Form
بڑی يے
Baṛī yē
ے [none] ے ے [none] ے
چھوٹی يے  
Chotī yē
ی ی ی ی ی ی ی ی
يَاء
Arabic Yāʾ
ي ي ي ي ي ي ي ي

Nasal Nun

Vowel nasalization is indicated by placing a nūn (ن) after the vowel and removing the supralinear dot ( ں , always in word-final position) or placing a V-shaped or U-shaped diacritic called maghnoona or ulta jazm on top (ن٘). This is known as nūn g͟hunnā or nūn-e-g͟hunnā ("nūn of nasalization"). For example, the nasalized form of the word ہَے (hai, /ɦɛː/) is written ہَیں (ha͠i, /ɦɛ̃ː/). Word-medially it is also present for the homorganic nasalization in digraphs with velar and retroflex consonants, such as in ٹان٘گ (ṭāṅg, /ʈɑːŋɡ/) or گھن٘ٹہ (ghaṇṭā, /ɡʱəɳʈɑː/), where the maghnoona or ulta jazm is often ignored unless disambiguation is necessary (as with Arabic-script diacritics in general).

Examples:
Position Urdu Transcription / Transliteration IPA spelling for Hindi equivalent Translation
Nasta'lyq Arial font [A]
Orthography ں ں / ◌̃ / (diacritic on a vowel) e.g. /ɛ̃ː/ /æ̃:/ ँ ं
End
form
مَیں مَیں maiṉ ma͠i /mæ̃:/ [25] /mɛ̃ː/ मैं I (first person singular pronoun) or egotism [25]
میں میں mẽ /mẽ:/ में in / within / among / between / at [25]
ہَیں ہَیں ha͠i /ɦɛ̃ː/ /hæ̃:/ "are" (auxiliary verb) [26]
Middle
form [M]
کن٘ول کن٘ول kaṉwal /kə̃vəl/ [27] Lotus flower [27]
گھن٘ٹہ گھن٘ٹہ ghaṇṭā ɡʱəɳʈɑː घंटा ghanta ritual bell[28], hour[28], clock[28], slang for penis[28], or vague non-specific expletive[29]  
ٹان٘گ ٹان٘گ ṭāṅg /ʈɑːŋɡ/ टांग the leg[30]    
پن٘جابی پن٘جابی Punjabi

Urdu: [pəndʒɑ:bi] [31]
Punjabi: [pənˈdʒaːbːi]

Punjabi: ਪੰਜਾਬੀ
Hindi: पंजाबी
PunjabiPanjabi
Footnotes:
^[A] Arial is a popular font for writing Arabic, it is included for readers who are not familiar with the letters in the Nasta'liq style.
^[M] For the medial form, the maghnoona or ulta jazm is often ignored unless disambiguation is necessary.

Vowel Diacritics

Urdu uses the same subset of diacritics used in Arabic based on Persian conventions. Urdu also uses Persian names of the diacritics instead of Arabic names. Commonly used diacritics are zabar (Arabic fatḥah), zer (Arabic kasrah), pesh (Arabic dammah) which are used to clarify the pronunciation of vowels, as shown above. Jazam (ـْـ , Arabic sukun) is used to indicate a consonant cluster and tashdid (ـّـ, Arabic shaddah) is used to indicate a gemination, although it is never used for verbs, which require double consonants to be spelled out separately. Other diacritics include khari zabar (Arabic dagger alif), do zabar (Arabic fathatan) which are found in some common Arabic loan words. Other Arabic diacritics are also sometimes used though very rarely in loan words from Arabic. Zer-e-izafat and hamzah-e-izafat are described in the next section.

Other than common diacritics, Urdu also has special diacritics, which are often found only in dictionaries for the clarification of irregular pronunciation. These diacritics include kasrah-e-majhool, fathah-e-majhool, dammah-e-majhool, maghnoona, ulta jazam, alif-e-wavi and some other very rare diacritics. Among these, only maghnoona is used commonly in dictionaries and has a Unicode representation at U+0658. Other diacritics are only rarely written in printed form, mainly in some advanced dictionaries.[32]


The two He's

He has two variants: gol he ("round he") and do-cashmī he ("two-eyed he").

Gol he (ہ) is the primary letter for the "h" (/ɦ/) sound but word-finally is pronounced as a long "a" or "e" (/ɑː/ or /e:/).

Do-cashmī he (ھ), which is written as a looped medial or initial hāʾ, is used to orthographically produce aspiration and breathy voice and sometimes to write Arabic words.

Gol He and do-cashmi-he diverged from the Arabic letter he, sometimes choti hey is used too refer to gol hey, while sometimes choti he refers to the Arabic version. The distinction is somewhat artificial, since gol he is an equivalent letter to the Arabic letter, but they have separate unicode characters. Some fonts make the Arabic he look the same as gol hey or do-cashmi he.

depictions of hey
He in very different fonts Nastaliq (N), Arial (A), and Tahoma (T) [footnote]
Letter name
and unicode
Isolated Form Final Form Middle Form Initial Form
N A T N A T N A T N A T
گول ہے
Gol he
U+06C1[33]
ہ ہ ہ ہ ـہ ـہ ہ ـہـ ـہـ ہ ہـ ہـ
دو چشمی ہے
Do-cashmī he
U+06BE[34]
ھ ھ ہ ھ ـھ ـہ ھ ـھـ ـہـ ھ ھـ ہـ
Arabic Letter Heh
U+0647[35]
ه ه ه ه ـه ـه ه ـهـ ـهـ ه هـ هـ

Footnotes: ^

  • This may display in different fonts to those listed if you do not have Arial, Tahoma, and a Nastaliq font installed.
  • Nasta'liq is the style used for almost everything written in Urdu, from official documents to web memes. This will only display in Nastaliq if you have a Nastaliq font installed on your system, such as Urdu Typesetting (on Windows), Google's Noto Nastaliq Urdu,[36] or SIL International's Awami Nastaliq.[37]
  • Arial is a font commonly used for Arabic, but it also includes the Urdu letters.
  • Tahoma has an extensive and distinctive Arabic character set, particularly for Hey.

Aspirated and breathy voiced consonants

Table of digraphs:
corresponding single-letter consonants "aspirated" consonants, " breathy voiced" consonants, and other digraphs [2][7][6][38][39][40][41]
Urdu Letter Devanagari IPA[ə] Urdu Digraphs Urdu Name(s) [6][39] Romanised name(s) Devanagari [38][6] Hindi name [41] ALA‑LC [7] IPA[ə]  
ھ (none) ʱ / ʰ (below) دوچَشْمی ہے [6] dō‑čašmī hē (none) h ʱ / ʰ
ب بھ [6] [40] بھے [6] bhē[6] bh bʱ [40]
پ پھ پھے [6] phē[6] ph [40]
ت تھ [6] تھے [6] thē[6] th t̪ʰ [40]
ٹ ٹھ ٹھے [6] ṭhē[6] ṭh ʈʰ
ج جھ [6] [40] جھے [6] jhē[6] jh d͡ʒʱ
[40]
چ چھ [6] چھے [6] čhē[6] / chē[6] / chhē ch t͡ʃʰ
د دھ [40] دھے [6] [40] dhē[6] dh d̪ʱ [40]
ڈ ڈھ [6] [40] ڈھے [6] ḍhē[6] ḍh ɖʱ [40]
ر رھ [40] [[ ]] __ rʱ [40]
ڑ ड़ ڑھ [40] ڑھے ṛhē [6] ढ़ ṛh ɽʱ [40]
ک کھ کھے khē kh kʰ
گ گھ [6] گھے [6] ghē [6] gh ɡʱ [40]
ن نھ [40] न्ह nh [40]
م مھ [40] म्ह mh [40]
ل لھ [40]

ल्ह

[40]
ی یھ य्ह
و وھ व्ह ʋʱ
ه هھ
ل ا لا لام الِف [6] lām alif ला [6] la

'Footnote:' . The Devanagari equivalents all add a schwa ə vowel to the IPA.


[40] IPA - Transliterate - LOC d̪ʱ dʰ dh [40] IPA - Transliterate - LOC ʈʰ ʈʰ ṭh [40] IPA - Transliterate - LOC ɖʱ ɖʰ ḍh [40] IPA - Transliterate - LOC kʰ kʰ kh [40] IPA - Transliterate - LOC ɡʱ gʰ gh [40] IPA - Transliterate - LOC t͡ʃʰ čʰ ch [40] IPA - Transliterate - LOC hʱ هʰ hh [40] IPA - Transliterate - LOC mʱ mʰ mh [40] IPA - Transliterate - LOC nʱ nʰ nh [40] ɽʱ ɽʰ ṛh

[41]


Examples of digraph usage:
corresponding
single-letters
digraphs Example
Urdu
Letter
IPA Urdu
Digraphs
[7][2][39]
IPA Urdu with
diacritics
IPA Latin Alphabet translation
ب بھ bʱ بھارت /bʰɑ:rət̪/
[42]
India [42]
بھالو // bhalo
[39]
[[]]
بھاری bʰɑ:ri heavy / fat / bulky / loud / difficult / important / wealthy [43]   
پ پھ پھول [39] / / phul
[39]
[[ ]]
پھول / / [[ ]]
ت تھ [footnote] تھم / / [[ ]]
تھال [39] / / thal
[39]
[[ ]]
لتھیم [44] / / Lithium Lithium  
ٹ ٹھ ʈʰ ٹھوس /ʈʰo:s/
[45]
solid / compact / firm / true / dull [45]
ٹھیلا [39] / / thela
[39]
[[ ]]
ٹھیس / / [[ ]]
ج جھ d͡ʒʱ جھاڑی / / [[ ]]
/ / [[ ]]
چ چھ t͡ʃʰ چھری [39] / / chhuri
[39]
[[ ]]
چھوکرا / / [[ ]]
د دھ dʱ دھم [39] / / dham
[39]
[[ ]]
گندھک /gənd̪ʰək/
[46]
sulphur / brimstone [46]
دھوبی / / [[ ]]
ڈ ڈھ ɖʱ ڈھال [39] / / dhal
[39]
[[ ]]
ڈھول / / [[ ]]
ر رھ [[ ]][[ ]] / / [[ ]]
/ / [[ ]]
ڑ ڑھ ɽʱ گڑھ [39] / / garh
[39]
[[ ]]
کڑھنا / / [[ ]]
ک کھ kʰ کھولنا / / [[ ]]
کھانا [39] /kʰɑ:nɑ:/
[47]
khana
[39]
food / meal / banquet [47]
دکھائی [48] /d̪ɪkʰɑ:i:/
[49]
inspection / appearance / showing / show [49]
گ گھ ɡʱ گھر [39] / / ghar
[39]
[[ ]]
گھبراہٹ / / [[ ]]
ن نھ ننّھا /nənnʰɑ:/
[50]
small / tiny
[50]
___ / / [[ ]]
م مھ تمھیں [51] / / [[ ]] (alternative of تُمہیں)
___ / / [[ ]]
و وھ [[ ]] / / [[ ]]
/ / [[ ]]
ل لھ
[39]

[52]

[40]
دولھا [39] دُولھا
[52]
/d̪u:lʰɑ:/
[52]
bridegroom [52]
/ / [[ ]]
ل ا لا
[footnote]
خلا [48] /xəlɑ:/
[53]
outer space, vacuum, vacant place, absence [53]
ملاعین [54] /məlɑ:ʔi:n/
[54]
accursed persons [54]
علاج عِلاج /ɪlɑ:dʒ/ [55] cure / remedy / antidote / relief [55]

^Lam Alif: This ligature is much mote prominent in Arabic styles than it is in Urdu's usual Nastaliq.

^Transliteration of "th": The digraph تھ is often used to transliterate "th" in European words, e.g. لتھیم Lithium [44].

Differences from the Persian alphabet

Urdu has more letters added to the Persian base to represent sounds not present in Persian, which already has additional letters added to the Arabic base itself to represent sounds not present in Arabic. The letters added include:

  • ṭē ٹ to represent voiceless retroflex stop /ʈ/
  • ḍāl ڈ to represent voiced retroflex stop /ɖ/
  • ṛē ڑ to represent retroflex flap /ɽ/
  • nūn-e ğunnah ں to represent nasal vowel /◌̃/
  • a separate do-cashmi-hē letter ھ exists to denote a aspirated consonant /ʰ/ or a murmured voice /ʱ/. This letter is mainly used as part of the multitude of digraphs, detailed below.
  • and baṛī yē ے is used to represent a long open-mid front unrounded vowel /ɛː/ or a long close-mid front unrounded vowel /eː/ at the end of a word, is a derivative of Persian letter yē ی یـ ـیـ ـی - which in Urdu is called čhōṭī yē. (The Persian and Urdu versions differ from the Arabic version يـ ـيـ ـي ي by omitting the dots in the final and isolated forms.)

Retroflex letters

^(back)

Old Hindustani used four dots over three Arabic letters to represent retroflex consonants: ٿ, ڐ and ڙ.[56] In handwriting those dots was often written like a small vertical line attached to a small triangle. Subsequently, this shape became identical to a small letter ط.[57] (It is commonly and erroneously assumed that ṭāʾ itself was used to indicate retroflex consonants because of its being an emphatic alveolar consonant that Arabic scribes thought approximated the Hindustani retroflexes. In modern Urdu ط, called to'e is always pronounced as a dental, not a retroflex.


modern Urdu Sindhi retroflex consonant old Urdu form Sindhi IPA for ٿ ڐ ڙ
alphabet Unicode name IPA alphabet Unicode alphabet Unicode
ٹ U+0679 ٹے ṭē /ʈ/ ٽ U+067D ٿ U+067F /tʰ/ or /t͡ɕʰ/
ڈ U+0688 ڈال ḍāl /ɖ/ ڊ U+068A ڐ U+0690 none
ڑ U+0691 ڑے ṛē /ɽ/ ڙ U+0699 ڙ U+0699 /ɽ/


Converting to and from the English alphabet

Conversion between the Urdu and English alphabets does not work the same way in both directions. In English when converting Urdu script to the Latin alphabet, the letters د and ت are often shown as "d" and "t", respectively, including in IPA, the most precise depiction of these letters is and but they are often simplified to d and t. Whereas the corresponding retroflexed versions of these letters ڈ and ٹ are the letters most often used for "d" and "t" in European loan words and transliterations of proper nouns. These letters are rarer in Urdu, to the that where European loan words like (doctor) and (tomato) are often the examples given when teaching Children the Urdu alphabet.


Hindi devanagari equivalent [6][41] ड़ ढ़
Urdu letters and digraphs ط ت ٹ تھ ٹھ د ڈ دھ ڈھ ر ڑ ڑھ رھ
Urdu IPA /t/ /t̪/ /ʈ/ /t̪ʰ/ /ʈʰ/ d / /ɖ/ /d̪ʱ/ /ɖʱ/ /r/ /ɽ/ /ɽʱ/ (i)
English spellings
for Urdu words
ALA‑LC[7] t th ṭh d dh ḍh r ṛh
Hunterian[8] t th d dh r rh
English words and names t th d r
English IPA t θ / ð d (ii)
Urdu for English
words and names
ٹ تھ ڈ ر

^i. rare letter combination in Urdu. ^ii. English R varies by dialect.

Examples:
letters English spelling
[footnote]
English
pronunciation
Urdu
pronunciation
Urdu spelling simplified
script
(re)transliteration origin alternate spelling
T→ٹ tomato tʰə̥ˈmɑːtʰəʊ ṭamāṭar ٹماٹر ٹماٹر Spanish: tomate
[t̪oˈmat̪e]
Portuguese: tomate
Portugal /tuˈma.tɨ/
D→ڈ T→ٹ doctor /ˈdɒktə/ /ɖɑːkʈər/ [58] ڈاکٹر ڈاکٹر English [58]
ست→ST Pakistan پاکستان پاکستان Sindhi: پاڪستان
D→ڈ T→ٹ R→ر Donald Trump ڈونلڈ ٹرمپ ڈونلڈ ٹرمپ
D→ڈ T→ٹ TH→تھ R→ر Elizabeth II "Elizabeth the second" Elizabeth dovam [59] ایلزبتھ دوم ایلزبتھ دوم
D→ڈ T→ٹ TH→تھ R→ر [[_]]
D→ڈ T→ٹ R→ر Women Democratic Front ویمن ڈیموکریٹک فرنٹ ویمن ڈیموکریٹک فرنٹ a Pakistani feminist organisation [60][61]
T→ٹ TH→تھ Portsmouth پورٹسماؤتھ پورٹسماؤتھ
T→ٹ ST→سٹ Australia آسٹریلیا آسٹریلیا
ٹھ→TH
ٹ→T
Thatta ٹھٹہ ٹھٹہ Sindhi: ٺٽو
T→ٹ R→ر Rockhampton راکہیمپٹن راکہیمپٹن
D→ڈ Queensland کوئنزلینڈ کوئنزلینڈ
ST→سٹ R→ر Australia آسٹریلیا آسٹریلیا
TH→تھ Lithium لتھیم لتھیم [44]
TH→تھ Bismuth بسمتھ بسمتھ [44]
T→ٹ Cobalt کوبالٹ کوبالٹ [44]
T→ٹ ST→سٹ Tungsten ٹنگسٹن ٹنگسٹن [44]
T→ٹ R→ر Nitrogen نائٹروجن نائٹروجن [44]
D→ڈ R→ر Hydrogen ہائیڈروجن ہائیڈروجن [44]
R→ر T→ٹ Rockhampton راکہیمپٹن راکہیمپٹن
D→ڈ QU→کو S→ز Queensland ˈkwiːnzlænd کوئنزلینڈ کوئنزلینڈ
QU→کو Queanbeyan /ˈkwnbiən/ کوینبیان کوینبیان
D→ڈ W→و Darwin ˈdɑːrwɪn ڈارون ڈارون
D→ڈ T→ٹ TH→تھ R→ر Bathurst, New South Wales /ˈbæθərst/ باتھرسٹ باتھرسٹ
D→ڈ T→ٹ TH→تھ R→ر [[_]]
D→ڈ T→ٹ TH→تھ R→ر [[_]]
D→ڈ T→ٹ TH→تھ R→ر [[_]]

^Footnote: Some of these names originated in other languages, but they are the English spellings of the words, or are names of people or places from regions where English is the main (or only) language.

Comparison to Hindi Devanagari and Arabic

Urdu and Hindi are mostly mutually intelligible, to the point that they are sometimes considered to be one language,[62] but this distinction is controversial (Hindi Urdu controversy). One of the biggest differences is the script; Hindi is usually written in Devanagari. Transliteration between the two scripts is neither simple nor unambiguous.[41] There are many cases where one character in Devanagari Hindi corresponds to multiple redundant characters in Persianised Urdu [6] or vice versa (see table below). in many of these cases the letters had different pronunciations in Arabic, from which the Urdu alphabet is derived (via the Persian alphabet). For example, the letters representing the emphatic consonants from Arabic, ط and ص are pronounced the same way as the corresponding non-emphatic consonants ت and س in Urdu. Though, when pronouncing Arabic words, particularly in a religious context, native speakers of Urdu go to great effort to pronounce the Arabic sounds unambiguously .

Redundancy and ambiguity of consonants:

Arabic phoneme for letter in ⅠPA
[footnote]
Arabic letter name Arabic Letter
[footnote]
Urdu letter name [6][41] Urdu letter Urdu & Hindi ⅠPA[ə] [63] Hindi Devanagari [6][41] Hindi letter name [41] Arabic letter for phoneme
[footnote]
MSA CA Naskh
(footnote)
Nasta'liq MSA CA
s sin س Seen س س s SA س
sod ص Suad ص ص
θ tha ث Saay ث ث
t / t̪ ta ت Tay ت ت TA ت
toh ط Toay ط ط
absent Ttay ٹ ٹ ʈ TTA absent
absent Daal (ḍāl) ڈ ڈ ɖ DA absent
absent
[footnote]
nūn-e ğunnah ں ں Chandrabindu absent ـني
[footnote]
kaf ك kāf ک ک ك
/j/
[footnote]
ya ي čhōṭī yē ی ی ي
/w/
[footnote]
wow و vāō / wāō و و ڤ ڥ

ف و
[footnote]

absent
[[ ]] [[ ]] [[ ]] [[ ]] [[ ]]
[[ ]] [[ ]] [[ ]] [[ ]] [[ ]]
ـهـ [footnote] dō‑čašmī hē ھ ھ [footnote] [[ ]]

^Hindi aspirated consonants: see aspirated consonants.


Footnotes:

^Naskh - Arial

^Hindi IPA ə - The Devanagari Hindi equivalents all add a schwa ə vowel to the ⅠPA.

^ Arabic ⅠPA - The ⅠPA pronunciation of the letter in Classical and Modern Standard Arabic.

^ Arabic Letter - There are a lot of differences between Arabic and Urdu writing even when there is a direct one to one correspondence between the letters, some of these differences are reflected in different unicode characters - ك ک and ي ی - while sometimes the differences are only reflected in font choice and hand writing styles.

^ Arabic spelling - The spelling for the Urdu phoneme in Classical and Modern Standard Arabic.

^ Arabic dotless nun - Arabic does sometimes use a dotless nun ں historically in the rasm script, but this is just a different way to write nun ن not historically or phonetically equivalent to Urdu nūn-e ğunnah. The Rasm script omits dots from all other letters (e.g. ٮ ڡ ٯ- qaf, fa, and ba/ta/tha) rendering many letters indistinguishable.

^ Arabic nasal vowel - Nasalized vowels occur in Classical Arabic but not in contemporary speech or Modern Standard Arabic. There is no orthographic way to denote the nasalization, but it is systematically taught as part of the essential rules of tajwid, used to read the Qur'an. Nasalization occurs in recitation, usually when nūn is followed by a yā’ ـني at the end of a word.

^Letters that can be consonants or vowels - This is the IPA for ye ی and wow و as consonants, for their vowel pronunciations, see vowel table below. The Urdu and Arabic ی and و can be a consonant or vowel depending on context like English Y.

^W and V. In Arabic و   says W. For the sound V, letters from related alphabets are often used ڤ ڥ   or the Arabic letter ف   (which normally says "F", as it does in Urdu).

comparison to neighbouring languages

Iẓāfat

Iẓāfat is a syntactical construction of two nouns, where the first component is a determined noun, and the second is a determiner. This construction was borrowed from Persian. A short vowel "i" is used to connect these two words, and when pronouncing the newly-formed word the short vowel is connected to the first word. If the first word ends in a consonant or an ʿain (ع), it may be written as zer (ِ) at the end of the first word, but usually is not written at all. If the first word ends in choṭī he (ہ) or ye (ی or ے) then hamzā (ء) is used above the last letter (ۂ or ئ or ۓ). If the first word ends in a long vowel (ا or و), then baṛī ye (ے) with hamzā on top (ئے) is added at the end of the first word.[64]

FormsExampleTransliterationMeaning
ــِشیرِ پنجابsher-e Punjābthe lion of Punjab
ئولئ کاملwalī-ye kāmilperfect saint
ئےروئے زمین-ye zamīnthe surface of the Earth
صدائے بلندsadā-ye bulanda high voice

Romanization standards and systems

Bibles in Roman Urdu, such as this one published by the Bible Society of India, are used by many Christians from the Indian subcontinent.

There are several romanization standards for writing Urdu with the Latin alphabet, though they are not very popular because most fall short of representing the Urdu language properly. Instead of standard romanization schemes, people on Internet, mobile phones and media often use a non-standard form of romanization which tries to mimic English orthography. The problem with this kind of romanization is that it can only be read by native speakers, and even for them with great difficulty. Among standardized romanization schemes, the most accurate is ALA-LC romanization, which is also supported by National Language Authority. Other romanization schemes are often rejected because either they are unable to represent sounds in Urdu properly, or they often do not take regard of Urdu orthography, and favor pronunciation over orthography.[65]

Roman Urdu also holds significance among the Christians of Pakistan and North India. Urdu was the dominant native language among Christians of Karachi and Lahore in present-day Pakistan and Madhya Pradesh, Uttar Pradesh Rajasthan in India, during the early part of the 19th and 20th century, and is still used by Christians in these places. Pakistani and Indian Christians often used the Roman script for writing Urdu. Thus Roman Urdu was a common way of writing among Pakistani and Indian Christians in these areas up to the 1960s. The Bible Society of India publishes Roman Urdū Bibles that enjoyed sale late into the 1960s (though they are still published today). Church songbooks are also common in Roman Urdu. However, the usage of Roman Urdu is declining with the wider use of Hindi and English in these states.

Computers and the Urdu alphabet

In the early days of computers, Urdu was not properly represented on any code page. One of the earliest code pages to represent Urdu was IBM Code Page 868 which dates back to 1990.[66] Other early code pages which represented Urdu alphabets were Windows-1256 and MacArabic encoding both of which date back to the mid 1990s. In Unicode, Urdu is represented inside the Arabic block. Another code page for Urdu, which is used in India, is Perso-Arabic Script Code for Information Interchange. In Pakistan, the 8-bit code page which is developed by National Language Authority is called Urdu Zabta Takhti (اردو ضابطہ تختی) (UZT) [67] which represents Urdu in its most complete form including some of its specialized diacritics, though UZT is not designed to coexist with the Latin alphabet.

Encoding Urdu in Unicode

Like other writing systems derived from the Arabic script, Urdu uses the 0600–06FF Unicode range.[68] Certain glyphs in this range appear visually similar (or identical when presented using particular fonts) even though the underlying encoding is different. This presents problems for information storage and retrieval. For example, the University of Chicago's electronic copy of John Shakespear's "A Dictionary, Hindustani, and English"[69] includes the word 'بهارت' (India). Searching for the string "بھارت" returns no results, whereas querying with the (identical-looking in many fonts) string "بهارت" returns the correct entry.[70] This is because the medial form of the Urdu letter do chashmi he (U+06BE)—used to form aspirate digraphs in Urdu—is visually identical in its medial form to the Arabic letter hāʾ (U+0647; phonetic value /h/). In Urdu, the /h/ phoneme is represented by the character U+06C1, called gol he (round he), or chhoti he (small he).

Confusable glyphs in Urdu and Arabic script
Characters in Urdu Characters in Arabic
ہ (U+06C1), ھ (U+06BE)ه (U+0647)
ی (U+06CC)ى (U+0649), ي (U+064A)
ک (U+06A9)ك (U+0643)

In 2003, the Center for Research in Urdu Language Processing (CRULP)[71]—a research organisation affiliated with Pakistan's National University of Computer and Emerging Sciences—produced a proposal for mapping from the 1-byte UZT encoding of Urdu characters to the Unicode standard.[72] This proposal suggests a preferred Unicode glyph for each character in the Urdu alphabet.

Software

The Daily Jang was the first Urdu newspaper to be typeset digitally in Nastaʻliq by computer. There are efforts underway to develop more sophisticated and user-friendly Urdu support on computers and on the Internet. Nowadays, nearly all Urdu newspapers, magazines, journals and periodicals are composed on computers via various Urdu software programmes, the most widespread of which is InPage Desktop Publishing package. Microsoft has included Urdu language support in all new versions of Windows and both Windows Vista and Microsoft Office 2007 are available in Urdu through Language Interface Pack[73] support. Most Linux Desktop distributions allow the easy installation of Urdu support and translations as well.[74] Apple implemented the Urdu language keyboard across Mobile devices in its iOS 8 update in September 2014.[75]

Computing and Typesetting

Despite the invention of the Urdu typewriter in 1911, Urdu newspapers continued to publish prints of handwritten scripts by calligraphers known as katibs or khush-navees until the late 1980s . The Pakistani national newspaper Daily Jang was the first Urdu newspaper to use Nastaliq computer-based composition. There are efforts under way to develop more sophisticated and user-friendly Urdu support on computers and the internet. Nowadays, nearly all Urdu newspapers, magazines, journals, and periodicals are composed on computers with Urdu software programs.

Keyboard

The Urdu keyboard is usually available on all major platforms such as Android, iOS and Windows however they can vary for instance Android and iOS devices usually use the phonetics keyboard whereas Windows machines use the UZT machines, although the Phonetics version is also available for Windows. MacOS machines use the same Phonetics keyboard as iOS devices.

Font

As of April 2020, iOS and MacOS are the only platforms to use the Nastaliq font as standard for the Urdu language.

Use of Urdu keyboard layout for other languages

Windows 10 uses the Urdu keyboard for the Arabic script versions of Punjabi and Sindhi languages, despite the Urdu keyboard missing several Sindhi letters (ڪ ڳ ڱ ڦ ٺ ٻ ڀ ڊ ڍ ڌ ڏ ڇ ڄ ڃ ي ڻ ۽ ۾ and the Urdu versions of ٹ ڑ which in Sindhi are written as (ٿ ڙ(see below). See also: Urdu keyboard

Geographic distribution

In addition to Pakistan, the Urdu language is official in five states of India: Bihar, Delhi, Jammu and Kashmir, Telangana, and Uttar Pradesh.

Other than the Indian subcontinent, the Urdu script is also used by Pakistan's large diaspora, including in the United Kingdom, the United Arab Emirates, the United States, Canada, Saudi Arabia, and other places.[76]

Many Urdu speakers living outside of Pakistan use the Latin alphabet to write Urdu do to limited availability of software for writing Urdu .

Distinction from Hindi

There are conflicting points of view about the division between Hindi and Urdu. (Main article: Hindi Urdu controversy.)

Some people hold the view that the distinction is old and intrinsic to the languages. The Urdu language emerged as a distinct register of Hindustani well before the Partition of India. It is distinguished most by its extensive Persian influences. This stands to reason: Persian was the official language of the Mughal government and the most prominent lingua franca of the Indian subcontinent for several centuries before the rise of the Maratha Empire in the 17th and 18th centuries.

Others claim that the difference is recent, and artificial, and more related to extrinsic cultural factors than it is too the language(s) themselves. The two languages are often collectively referred to as " Hindustani", but generally only by outsiders, and term is regarded by some sources as outdated.

Urdu and Hindi, an official federal language of India, are different registers of the same language, and thus they are mutually intelligible and can use each other's script to write the other's language. Usage of script generally signifies the user's faith: Muslims generally use the Urdu (Perso-Arabic) script, while Hindus use the Devanagari script .

In addition to Pakistan, the Urdu script is official in five states of India with a substantial percentage of Hindustani-speaking Muslims: Bihar, Delhi, Jammu and Kashmir, Telangana, and Uttar Pradesh.


Endnotes

^Note: Some of the Nastaliq text on this page will probably show in a different style if you do not have a Nastaliq font installed. If this نستعلیق and this نستعلیق looks like these four نستعلیق نستعلیق نستعلیق نستعلیق then you are probably seeing it written in a modern Arabic style.

See also

  • Nastaʻliq script
  • Persian alphabet
  • Urdu Wikipedia
  • Urdu keyboard
  • Urdu Braille
  • Urdu Informatics
  • Romanization of Urdu

References

  1. Project Fluency (7 October 2016). Urdu: The Complete Urdu Learning Course for Beginners: Start Speaking Basic Urdu Immediately (Kindle ed.). p. Kindle Locations 66–67. ISBN 978-1539047803.
  2. "Urdu alphabet, pronunciation and language". www.omniglot.com.
  3. "Controversy over number of letters in Urdu alphabet". DAWN.COM. 15 June 2009.
  4. "Corpus Based Urdu Lexicon Development" (PDF).
  5. Delacy 2003, p. XV–XVI.
  6. "Urdu Alphabet". www.user.uni-hannover.de. Retrieved 29 February 2020.
  7. "Urdu romanization" (PDF). The Library of Congress.
  8. Geographical Names Romanization in Pakistan. UNGEGN, 18th Session. Geneva, 12–23 August 1996. Working Papers No. 85 and No. 85 Add. 1.
  9. "آزادی". ur.oxforddictionaries.com. Retrieved 11 March 2020.
  10. "Unicode Utilities: UnicodeSet Arabic pedagogical symbols". unicode.org. Retrieved 20 March 2020.
  11. "چھوٹی". ur.oxforddictionaries.com. Retrieved 10 March 2020.
  12. "چھوٹا". ur.oxforddictionaries.com. Retrieved 10 March 2020.
  13. "گول". Oxford Urdu Living Dictionary. Retrieved 15 March 2020.
  14. "چشم Urdu to English Translation - Oxford Dictionaries". Oxford Urdu Living Dictionary. Retrieved 29 March 2020.
  15. "بَڑی". ur.oxforddictionaries.com. Retrieved 11 March 2020.
  16. "غُنّہ". oxforddictionaries. Retrieved 13 March 2020.
  17. "مہموز". ur.oxforddictionaries. Retrieved 14 March 2020.
  18. "حرف". ur.oxforddictionaries. Retrieved 15 March 2020.
  19. "حُرُوف". ur.oxforddictionaries. Retrieved 15 March 2020.
  20. Zaki, Meekal. Urdu Dictionary - Roman URDU To English Fun Dictionary - Searchable (Kindle Edition. ed.).
  21. FWP. "Urdu: some thoughts about the script and grammar, and other general notes for students assembled from years of classroom notes by FWP". www.columbia.edu. Retrieved 28 February 2020.
  22. "The chart below gives the different positional variants of some of the significantly different letters. (scanned document)". Linked by www.columbia.edu/itc/mealac/pritchett/00urdu/urduscript/section00.html#00_01. Retrieved 28 February 2020.
  23. "میں". ur.oxforddictionaries. Retrieved 14 March 2020.
  24. "are ہَیں Oxford Dictionaries". ur.oxforddictionaries.com. Retrieved 29 February 2020.
  25. "Lotus کنول Urdu to English Translation - Oxford Dictionaries". Oxford Urdu Living Dictionary. Retrieved 29 February 2020.
  26. "گھنٹہ". ur.oxforddictionaries. Retrieved 14 March 2020.
  27. "Urban Dictionary: ghanta". Urban Dictionary. Retrieved 14 March 2020.
  28. "ٹانگ". ur.oxforddictionaries. Retrieved 14 March 2020.
  29. "پن٘جابی". ur.oxforddictionaries. Retrieved 14 March 2020.
  30. "Proposal of Inclusion of Certain Characters in Unicode" (PDF).
  31. "Unicode Utilities: Character Properties 06C1". unicode.org. Retrieved 2 March 2020.
  32. "Unicode Utilities: Character Properties 06BE". unicode.org. Retrieved 2 March 2020.
  33. "Unicode Utilities: Character Properties 0647". unicode.org. Retrieved 2 March 2020.
  34. "Google Noto Fonts". www.google.com. Retrieved 8 March 2020.
  35. "Awami Nastaliq". software.sil.org. Retrieved 8 March 2020.
  36. "Hindi alphabet, pronunciation and language". www.omniglot.com. Retrieved 24 February 2020.
  37. Kashani, Aabid (2019). Urdu: The Ultimate Beginners Learning Guide: Master The Fundamentals Of The Urdu Language (Kindle ed.). Dirk Alan Llorens?.
  38. "Urdu writing system summary". r12a.github.io. Retrieved 15 March 2020.
  39. Jawaid, Bushra; Ahmed, Tafseer (2009). "Hindi to Urdu Conversion: Beyond Simple Transliteration" (PDF). Proceedings of the Conference on Language & Technology 2009. Retrieved 29 February 2020.
  40. "بھارت". Oxford Urdu Living Dictionary. Retrieved 15 March 2020.
  41. "بھاری". oxforddictionaries. Retrieved 13 March 2020.
  42. "Urdu: Periodic Table". www.biscuitcitypress.com. Retrieved 15 March 2020.
  43. "ٹھوس". Oxford Urdu Living Dictionary. Retrieved 15 March 2020.
  44. "گندھک". Oxford Urdu Living Dictionary. Retrieved 15 March 2020.
  45. "کھانا Urdu to English Translation - Oxford Dictionaries". Oxford Urdu Living Dictionary. Retrieved 21 March 2020.
  46. "خلا سے انڈیا کی فضا الگ کیوں دکھائی دیتی ہے؟". BBC News اردو (in Urdu). 27 June 2018. Retrieved 21 March 2020.
  47. "دِکھائی Urdu to English Translation - Oxford Dictionaries". Oxford Urdu Living Dictionary. Retrieved 21 March 2020.
  48. "ننّھا". oxford dictionaries. Retrieved 24 February 2020.
  49. "تُمھیں". Oxford Dictionaries. Retrieved 24 February 2020.
  50. "دُولھا Urdu to English Translation - Oxford Dictionaries". Oxford Urdu Living Dictionary. Retrieved 25 March 2020.
  51. "خلا Urdu to English Translation - Oxford Dictionaries". Oxford Urdu Living Dictionary. Retrieved 21 March 2020.
  52. "ملاعین Urdu to English Translation - Oxford Dictionaries". Oxford Urdu Living Dictionary. Retrieved 25 March 2020.
  53. "عِلاج Urdu to English Translation - Oxford Dictionaries". Oxford Urdu Living Dictionary. Retrieved 25 March 2020.
  54. Ballantyne, James Robert (1842). A Grammar of the Hindustani Language, with Brief Notices of the Braj and Dakhani Dialects. Madden & Company. p. 11.
  55. Berggren, Olaf (2002). Scripts. Bibliotheca Alexandrina. p. 108.
  56. "ڈاکٹر Urdu to English Translation - Oxford Dictionaries". Oxford Urdu Living Dictionary. Retrieved 26 March 2020.
  57. "Urdu dictionary". Rekhta. Retrieved 22 April 2020.
  58. "عورت آزادی مارچ کا مقصد کیا ہے؟". Independent Urdu (in Urdu). Retrieved 16 April 2020.
  59. "ویمن ڈیموکریٹک فرنٹ کا اجتماع کیوں روکا گیا؟". Independent Urdu (in Urdu).
  60. Carreiro, Heather. "Why Hindi-Urdu is one language and Arabic is several - May 28, 2010". Matador Network.
  61. "Urdu alphabet, pronunciation and language". www.omniglot.com. Retrieved 29 February 2020.
  62. Delacy 2003, p. 99–100.
  63. "اردو میں نقل حرفی ۔ ایک ابتدائی تعارف : نبلٰی پیرزادہ". nlpd.gov.pk.
  64. "IBM 868 code page"
  65. "Urdu Zabta Takhti" (PDF).
  66. "Arabic" (PDF). unicode.org. Retrieved 7 April 2019.
  67. "A dictionary, Hindustani and English". Dsal.uchicago.edu. 29 September 2009. Retrieved 18 December 2011.
  68. "A dictionary, Hindustani and English". Dsal.uchicago.edu. Retrieved 18 December 2011.
  69. "Center for Research in Urdu Language Processing". Crulp.org. Retrieved 18 December 2011.
  70. Archive index at the Wayback Machine
  71. "مائِیکروسافٹ ڈاؤُن لوڈ مَرکَزWindows". Microsoft.com. Retrieved 18 December 2011.
  72. "Ubuntu in Urdu « Aasim's Web Corner". Aasims.wordpress.com. Retrieved 18 December 2011.
  73. "E-Urdu: How one man's plea for Nastaleeq was heard by Apple". The Express Tribune. 16 October 2014. Retrieved 29 March 2015.
  74. "Urdu". Omniglot.com.

Sources

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.