Python sorts by byte value by default, which means é comes after z and other equally funny things. What is the best way to sort alphabetically in Python?
Is there a library for this? I couldn’t find anything. Preferrably sorting should have language support so it understands that åäö should be sorted after z in Swedish, but that ü should be sorted by u, etc. Unicode support is thereby pretty much a requirement.
If there is no library for it, what is the best way to do this? Just make a mapping from letter to a integer value and map the string to a integer list with that?
IBM’s ICU library does that (and a lot more). It has Python bindings: PyICU.
Update: The core difference in sorting between ICU and
locale.strcollis that ICU uses the full Unicode Collation Algorithm whilestrcolluses ISO 14651.The differences between those two algorithms are briefly summarized here: http://unicode.org/faq/collation.html#13. These are rather exotic special cases, which should rarely matter in practice.