I was reading the specification of Unicode @ Wikipedia (Arabic Unicode)
and I see that each of the Arabic digits has 2 Unicode code points.
For example 1 is defined as U+0661 and as U+06F1.
Which one should I use?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
According to the code charts, U+0660 .. U+0669 are ARABIC-INDIC DIGIT values 0 through 9, while U+06F0 .. U+06F9 are EXTENDED ARABIC-INDIC DIGIT values 0 through 9.
In the Unicode 3.0 book (5.2 is the current version, but these things don’t change much once set), the U+066n series of glyphs are marked ‘Arabic-Indic digits’ and the U+06Fn series of glyphs are marked ‘Eastern Arabic-Indic digits (Persian and Urdu)’.
It also notes:
For comparison:
Or, enlarged by making the information into a title:
U+066n: ٠١٢٣٤٥٦٧٨٩
U+06Fn: ۰۱۲۳۴۵۶۷۸۹
Or:
(Whether you can see any of those, and how clearly they are differentiated may depend on your browser and the fonts installed on your machine as much as anything else. I can see the difference on 4 and 6 clearly; 5 looks much the same in both.)
Based on this information, if you are working with Arabic from the Middle East, use the U+066n series of digits; if you are working with Persian or Urdu, use the U+06Fn series of digits. As a Unicode application, you should accept either set of codes as valid digits (but you might look askance at a sequence that mixed the two sets of digits – or you might just leave well alone).