The problem I’m having is that I need to sort a whole bunch of char pointers, but they have special characters. I managed to get a sorting procedure like so:
std::sort(dict_.begin(), dict_.end(), comp);
bool comp(NumPair& a, NumPair& b)
{
return boost::lexicographic_compare(a.pFirst, b.pFirst);
}
This worked great, except that all special german characters were sorted before all the others. My teacher (yes, this is pertaining to a homework assignment), however, wants them to be sorted at the end. Awesome!
So I was playing around and thought I could use a trick I saw on a website to enable a regional locale to include the special characters like so
return boost::lexicographic_compare(a.pFirst, b.pFirst, locale("german"));
Didn’t work! So:
bool comp()
{
setlocale(LC_ALL, "");
return boost::lexicographic_compare(a.pFirst, b.pFirst);
}
Didn’t work!
If you have them, I would love to hear some other ideas that might actually work.
Update:
As requested, some sample input and output:
// Some entries
dict_.push_back( NumPair ( "öffnen", "to open" ) );
dict_.push_back( NumPair ( "überraschen", "to surprise" ) );
dict_.push_back( NumPair ( "wünschen", "to wish, to desire, to want" ) );
dict_.push_back( NumPair ( "widersprechen", "to contradict_" ) );
// NumPair ctor.
NumPair( const char *pFirst, const char *pSecond )
{
/* Deep copy of pFirst and pSecond */
}
Output after result:
öffnen
überraschen
wünschen
widersprechen
You might want to show more of your code, like exactly what strings you’re using that are causing this problem. I’m easily able to sort a set of German words, and any words beginning with non-ASCII special German characters are ordered at the end. This happens even without any special German locale settings, since in Unicode non-ASCII characters have higher codepoint values than ASCII characters.
For example:
This outputs:
Note the use of wide character strings. Since lexicographical comparison routines compare character-by-character, you need to use wide characters or else the comparison function will end up comparing the string byte-by-byte instead of character-by-character. This will result in invalid comparisons since not every Unicode character can be stored in a single byte. Special German characters, for example, are 2 bytes in UTF-8, so you need a data type capable of containing the range of 0x00 to 0xFFFF in a single element. On most platforms,
wchar_tis sufficient for this.(Also note that it’s not a good practice to include non-ASCII characters in source code. Use “universal character codes” instead. I’m just using non-ASCII source here for clarity.)