MIK (character set)

MIK (МИК) is a 8-bit Cyrillic code page used with DOS. It is based on the character set used in the Bulgarian Pravetz 16[1] IBM PC compatible system. Kermit calls this character set "BULGARIA-PC" / "bulgaria-pc".[2][3][4] In Bulgaria, it was sometimes incorrectly referred to as code page 856 (which clashes with IBM's definition for a Hebrew code page).

This is the most widespread DOS/OEM code page used in Bulgaria, rather than CP 808, CP 855, CP 866 or CP 872.

Almost every DOS program created in Bulgaria, which has Bulgarian strings in it, was using MIK as encoding, and many such programs are still in use.

Character set

Each character is shown with its equivalent Unicode code point and its decimal code point. Only the second half of the table (code points 128255) is shown, the first half (code points 0127) being the same as ASCII.

  Letter   Number   Punctuation   Symbol   Other   undefined

MIK[5][6][4]
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
8_ А
0410
128
Б
0411
129
В
0412
130
Г
0413
131
Д
0414
132
Е
0415
133
Ж
0416
134
З
0417
135
И
0418
136
Й
0419
137
К
041A
138
Л
041B
139
М
041C
140
Н
041D
141
О
041E
142
П
041F
143
9_ Р
0420
144
С
0421
145
Т
0422
146
У
0423
147
Ф
0424
148
Х
0425
149
Ц
0426
150
Ч
0427
151
Ш
0428
152
Щ
0429
153
Ъ
042A
154
Ы
042B
155
Ь
042C
156
Э
042D
157
Ю
042E
158
Я
042F
159
A_ а
0430
160
б
0431
161
в
0432
162
г
0433
163
д
0434
164
е
0435
165
ж
0436
166
з
0437
167
и
0438
168
й
0439
169
к
043A
170
л
043B
171
м
043C
172
н
043D
173
о
043E
174
п
043F
175
B_ р
0440
176
с
0441
177
т
0442
178
у
0443
179
ф
0444
180
х
0445
181
ц
0446
182
ч
0447
183
ш
0448
184
щ
0449
185
ъ
044A
186
ы
044B
187
ь
044C
188
э
044D
189
ю
044E
190
я
044F
191
C_
2514
192

2534
193

252C
194

251C
195

2500
196

253C
197

2563
198

2551
199

255A
200

2554
201

2569
202

2566
203

2560
204

2550
205

256C
206

2510
207
D_
2591
208

2592
209

2593
210

2502
211

2524
212

2116
213
§
00A7
214

2557
215

255D
216

2518
217

250C
218

2588
219

2584
220

258C
221

2590
222

2580
223
E_ α
03B1
224
ß
00DF[nb 1]
225
Γ
0393
226
π
03C0
227
Σ
03A3[nb 2]
228
σ
03C3
229
µ
00B5[nb 3]
230
τ
03C4
231
Φ
03A6
232
Θ
0398
233
Ω
03A9[nb 4]
234
δ
03B4
235

221E
236
φ
03C6
237
ε
03B5[nb 5]
238

2229
239
F_
2261
240
±
00B1
241

2265

242

2264
243

2320
244

2321
245
÷
00F7
246

2248
247
°
00B0
248

2219
249
·
00B7
250

221A
251

207F
252
²
00B2
253

25A0
254
NBSP
00A0
255

Notes for implementors of mapping tables to Unicode

Implementors of mapping tables to Unicode should note that the MIK Code page unifies some characters:

  1. 0xE1 is both the German sharp S (U+00DF, ß) and the Greek lowercase beta (U+03B2, β);
  2. 0xE4 is both the n-ary summation sign (U+2211, ∑) and the Greek uppercase sigma (U+03A3, Σ);
  3. 0xE6 is both the micro sign (U+00B5, µ) and the Greek lowercase mu (U+03BC, μ);
  4. 0xEA is both the Ohm sign (U+2126, Ω) and the Greek uppercase omega (U+03A9, Ω);
  5. 0xEE is both the element-of sign (U+2208, ∈) and the Greek lowercase epsilon (U+03B5, ε)!

Binary character manipulations

The MIK code page maintains in alphabetical order all Cyrillic letters which enables very easy character manipulation in binary form:

10xx xxxx - is a Cyrillic Letter

100x xxxx - is an Upper-case Cyrillic Letter

101x xxxx - is a Lower-case Cyrillic Letter

In such case testing and character manipulating functions as:

IsAlpha(), IsUpper(), IsLower(), ToUpper() and ToLower(),

are bit operations and sorting is by simple comparison of character values.

See also

References

  1. "Pravetz 16". Archived from the original on 2016-12-06. Retrieved 2016-12-06.
  2. da Cruz, Frank (2010-04-02). "Kermit and MIME Character-Set Names". The Kermit Project. Columbia University, New York, USA. Archived from the original on 2016-12-02. Retrieved 2016-12-02.
  3. http://www.kermitproject.org/k95manual/cyrillic.html
  4. 1 2 ftp://kermit.columbia.edu/kermit/charsets/cp856.txt
  5. Czyborra, Roman (1998-11-30) [1998-05-25]. "The Cyrillic Charset Soup". Archived from the original on 2016-12-03. Retrieved 2016-12-03.
  6. Hohlov, Yu. E. "Cyrillic Information Representation in Electronic Form - Character Set (Code Page) Tables". Archived from the original on 2016-12-05. Retrieved 2016-12-05.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.