I have an application that displays user data in a sorted list. The list has an index, which in English is the letters A-Z. Tapping on a letter in the index jumps to items starting with that letter. This works for English like languages, but completely fails for languages using different character sets (such as Chinese).
I can use ICU to collate the list of items to the correct order, but how can I find a correct set of indexes for other languages? Note that I don’t know the entire list ahead of time, so generating the index from the data is not possible.
The indexes could be recalculated for each supported language, but in that case how would I locate such lists?
The “index characters” information in CLDR exists for such purposes:
“The index characters are an ordered list of characters for use as a UI “index”, that is, a list of clickable characters (or character sequences) that allow the user to see a segment of a larger “target” list.”
( http://www.unicode.org/reports/tr35/#Character_Elements )
I’m afraid such information isn’t in ICU yet, but if you need this for a few languages only, you could copy the data from
http://unicode.org/repos/cldr-tmp/trunk/diff/by_type/misc.indexCharacters.html