I am working at the moment with some data in hungarians.
I have to sort a list of hungarians strings.
According to this Collation Sequence page
Hungarian alphabetic order is: A=Á, B, C, CS, D, DZ, DZS, E=É, F, G,
GY, H, I=Í, J, K, L, LY, M, N, NY, O=Ó, Ö=Ő, P, Q, R, S, SZ, T, TY,
U=Ú, Ü=Ű, V, W, X, Y, Z, ZS
So vowels are treated the same (A=Á, …) so in the result you can have some like that using Collator :
Abdffg
Ádsdfgsd
Aegfghhrf
Up to here, no problem 🙂
But now, I have the requirement to sort according to the Hungarian alphabet
A Á B C Cs D Dz Dzs E É F G Gy H I Í J K L Ly M N Ny O Ó Ö Ő P (Q) R S
Sz T Ty U Ú Ü Ű V (W) (X) (Y) Z Zs
A is considered different than Á
Playing with the Strength from Collator doesnt change the order in the output. A and Á are still mixed up.
Is there any librairies/tricks to sort a list of string according to the hungarian alphabetical order?
So far what I am doing is :
- Sort with
Collatorso that the C/Cs, D,DZ, DZS… are sorted correctly - Sort again by comparing the first characters of each word based on a map
This looks too much hassle for the task no?
List<String> words = Arrays.asList(
"Árfolyam", "Az",
"Állásajánlatok","Adminisztráció",
"Zsfgsdgsdfg", "Qdfasfas"
);
final Map<String, Integer> map = new HashMap<String, Integer>();
map.put("A",0);
map.put("Á",1);
map.put("E",2);
map.put("É",3);
map.put("O",4);
map.put("Ó",5);
map.put("Ö",6);
map.put("Ő",7);
map.put("U",8);
map.put("Ú",9);
map.put("Ü",10);
map.put("Ű",11);
final Collator c = Collator.getInstance(new Locale("hu"));
c.setStrength(Collator.TERTIARY);
Collections.sort(words, c);
Collections.sort(words, new Comparator<String>(){
public int compare(String s1, String s2) {
int f = c.compare(s1,s2);
if (f == 0) return 0;
String a = Character.toString(s1.charAt(0));
String b = Character.toString(s2.charAt(0));
if (map.get(a) != null && map.get(b) != null) {
if (map.get(a) < map.get(b)) {
return -1;
}
else if (map.get(a) == map.get(b)) {
return 0;
}
else {
return 1;
}
}
return 0;
}
});
Thanks for your input
I found a good idea, you can use a RuleBasedCollator.
Source: http://download.oracle.com/javase/tutorial/i18n/text/rule.html
And here is the Hungarian rule: