this is a question about unicode characters in a text input file. This discussion

Question

0

Asked: June 17, 20262026-06-17T16:38:29+00:00 2026-06-17T16:38:29+00:00

this is a question about unicode characters in a text input file. This discussion

0

this is a question about unicode characters in a text input file. This discussion was close but not quite the answer. Compiled with VS2008 and executed on Windows these charcters are recognized on read (maybe represented as a different symbol but read) – compiled with g++and executed on linux they are displayed as blank.

‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž ‘ ’ “ ” • – — ˜ ™ š › œ ž Ÿ

The rest of the Unicode symbols appear to work fine, I did not check them all but found this set did not work.

Questions:
(1) Why?
(2) Is there a solution?

void Lexicon::buildMapFromFile(string filename )  //map
{
    ifstream file;
    file.open(filename.c_str(), ifstream::binary);
    string wow, mem, key;
    unsigned int x = 0;

    while(true) {
        getline(file, wow);
        cout << wow << endl;
        if (file.fail()) break; //boilerplate check for error
        while (x < wow.length() ) {
            if (wow[x] == ',') { //look for csv deliniator
                key = mem;
                mem.clear();
                x++; //step over ','
            } else 
                mem += wow[x++];
        }

        //cout << mem << " code " << key << " is " << (key[0] - '€') << " from €" << endl;

        cout << "enter 1 to continue: ";
        while (true) {
            int choice = GetInteger();
            if (choice == 1) break;
        }

        list_map0[key] = mem; //char to string
        list_map1[mem] = key; //string to char
        mem.clear(); //reset memory
        x = 0;//reset index
    }
    //printf("%d\n", list_map0.size());
    file.close();
}

The unicode symbols are read from a csv file and parsed for the unicode symbol and an associated string. Initially I though there was a bug in the code but in this post the review found it is fine and I followed the issue to how the characters are handled.

The test is cout << wow << endl;

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T16:38:31+00:00

The characters you show are all characters from Windows codepage 1252 which do not exist in the ISO-8859 1 encoding. These two encodings are similar and so are often confused.

If the input is CP1252 and you are reading it as though it were ISO-8859 1 then those characters are read as control characters and will not behave as normal, visible characters.

There are many possible things you could be doing incorrectly to cause this, but you’ll have to post more details in order to determine which. A more complete answer requires knowing how you are reading the data, how you are converting and storing it internally, how you are testing the read data, and the input data and/or encoding.

Your displayed code doesn’t do any conversions while reading the data, and the commented-out code to print the data is the same; no conversions. This means to print the data you are simply relying on the input data to be correct for the platform you run the program on. That means that, for example, if you run your program in the console on Windows then your input file needs to be encoded using the console’s codepage*.

To fix the problem you can either; ensure the input file matches the encoding required by the particular console you run the program on; or specify the input encoding, convert to a known internal encoding when reading and then convert to the console encoding when printing.

_{* and if it’s not, for example if the console is cp437 and the file is cp1252 then the characters you listed will instead show up as: É æ Æ ô ö ò û ù ÿ Ö Ü ¢ £ ¥ ₧ ƒ á í ó ú ñ Ñ ª º ¿ ⌐ ¬ ½ ¼ ¡ « »}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

this is a question about unicode characters in a text input file. This discussion

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply