Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8967333
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T17:13:10+00:00 2026-06-15T17:13:10+00:00

I have an arbitrary Unicode string that represents a number, such as 2, ٢

  • 0

I have an arbitrary Unicode string that represents a number, such as “2”, “٢” (U+0662, ARABIC-INDIC DIGIT TWO) or “Ⅱ” (U+2161, ROMAN NUMERAL TWO). I want to convert that string into an int. I don’t care about specific locales (the input might not be in the current locale); if it’s a valid number then it should get converted.

I tried QString.toInt and QLocale.toInt, but they don’t seem to get the job done. Example:

bool ok;
int n;
QString s = QChar(0x0662); // ARABIC-INDIC DIGIT TWO

n = s.toInt(&ok); // n == 0; ok == false

QLocale anyLocale(QLocale::AnyLanguage, QLocale::AnyScript, QLocale::AnyCountry);
n = anyLocale.toInt(s, &ok); // n == 0; ok == false

QLocale cLocale = QLocale::C;
n = cLocale.toInt(s, &ok); // n == 0; ok == false

QLocale arabicLocale = QLocale::Arabic; // Specific locale. I don't want that.
n = arabicLocale.toInt(s, &ok); // n == 2; ok == true

Is there a function I am missing?

I could try all locales:

QList<QLocale> allLocales = QLocale::matchingLocales(QLocale::AnyLanguage, QLocale::AnyScript, QLocale::AnyCountry);
for(int i = 0; i < allLocales.size(); i++)
{
    n = allLocales[i].toInt(s, &ok);
    if(ok)
        break;
}

But that feels slightly hackish. Also, it does not work for all strings (e.g. Roman numerals, but that’s an acceptable limitation). Are there any pitfalls when doing it that way, such as conflicting rules in different locales (cf. Turkish vs. non-Turkish letter case rules)?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T17:13:11+00:00Added an answer on June 15, 2026 at 5:13 pm

    I’ not aware of any ready to use package which does this (but
    maybe ICU supports it), but it isn’t hard to do if you really
    want to. First, you should download the UnicodeData.txt file
    from http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
    This is an easy to parse ASCII file; the exact syntax is
    described in http://www.unicode.org/reports/tr44/tr44-10.html,
    but for your purposes, all you need to know is that each line in
    the file consists of semi-colon separated fields. The first
    field contains the character code in hex, the third field the
    “general category”, and if the third field is “Nd” (numeric,
    decimal), the seventh field contains the decimal value.

    This file can easily be parsed using Python or a number of other
    scripting languages, to build a mapping table. You’ll want some
    sort of sparse representation, since there are over a million
    Unicode characters, of which very few (a couple of hundred) are
    decimal digits. The following Python script will give you a C++
    table which can be used to initialize an
    std::map<int, int>;. If the character is
    in the map, the mapped element is its value.

    Whether this is sufficient or not depends on your application.
    It has several weaknesses:

    • It requires extra logic to recognize when two successive
      digits are in different alphabets. Presumably a sequence "1١"
      should be treated as two numbers (1 and 1), rather than as one
      (11). (Because all of the sets of decimal digits are in 10
      successive codes, it would be fairly easy, once you know the
      digit, to check whether the preceding digit character was in the
      same set.)

    • It ignores non-decimal digits, like ௰ or ൱ (Tamil ten and
      Malayam one hundred). There aren’t that many of them, and they are
      also in the UnicodeData.txt file, so it might be possible to
      find them manually and add them to the table. I don’t know
      myself, however, how they combine with other digits when numbers
      have been composed.

    • If you’re converting numbers, you might have to worry about
      the direction. I’m not sure how this is handled (but there is
      documentation at the Unicode site); in general, text will appear
      in its natural order. In the case of Arabic and related
      languages, when reading in the natural order, the low order
      digits appear first: something like "١٢" (literally "12",
      but because the writing is from right to left, the digits will
      appear in the order "21") should be interpreted as 12, and not 21. Except that I’m not sure whether a change direction mark is
      present or not. (The exact rules are described in the
      documentation at the Unicode site; in the UnicodeData.txt file,
      the fifth field—index 4—gives this information. I
      think if it’s anything but "AN", you can assume the big-endian
      standard used in Europe, but I’m not sure.)

    Just to show how simple this is, here’s the Python script to
    parse the UnicodeData.txt file for the digit values:

    print('std::pair<int, int> initUnicodeMap[] = {')
    for line in open("UnicodeData.txt"):
        fields = line.split(';')
        if fields[2] == 'Nd':
            print('    {{{:d}, {:d}}},'.format(int(fields[0], 16), int(fields[7])))
    print('};')
    

    If you’re doing any work with Unicode, this files is a gold mine
    for generating all sorts of useful tables.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a class that spawns an arbitrary number of worker object that compute
I have a model that has an arbitrary number of children entities. For simplicity
I need a method that can have an arbitrary number of parameters. In C#
I have two lines, one straight and one curvy. Both have an arbitrary number
I want to send email messages that have arbitrary unicode bodies in a Python
In my String, I can have an arbitrary number of words which are comma
I have an arbitrary number of polygons (hexes in this case) that are arranged
I have some arbitrary pixel data that I want to save as a PNG.
In Java I have an arbitrary HTML document as a string. For simplicity, say:
So I've got some data. There are entities. Entities have an arbitrary number of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.