I have a problem in comparing strings.I want to compare two “éd” and “ef” french texts like this
Collator localeSpecificCollator = Collator.getInstance(Locale.FRANCE);
CollationKey a = localeSpecificCollator.getCollationKey("éd");
CollationKey b = localeSpecificCollator.getCollationKey("ef");
System.out.println(a.compareTo(b));
This will print -1, but in french alphabet e come before é. But when we compare only e and é like this
Collator localeSpecificCollator = Collator.getInstance(Locale.FRANCE);
CollationKey a = localeSpecificCollator.getCollationKey("é");
CollationKey b = localeSpecificCollator.getCollationKey("e");
System.out.println(a.compareTo(b));
result is 1. Can you tell we what is wrong in first part of code?
This seems to be the expected behaviour and it also seems to be the correct way to sort alphabetically in French.
The Android javadoc gives a hint as to why it is behaving like that – I suppose the details of the implementation in android are similar, if not identical, to the the standard JDK:
In other words, because your 2 strings are sortable by only looking at primary differences (excluding the accents) the collator does not check the other differences.
It seems to be compliant with the Unicode Collation Algorithm (UCA):
And it also seems to be the correct way to sort alphabetically in French, according to the wikipedia article on “ordre alphabetique”:
In English: the order initially ignores accents and case – if 2 words can’t be sorted that way, accents and case are then taken into account.