In Java the String#toLowerCase method uses the default system Locale to determine how to handle lowercasing. If I am lowercasing some ASCII text and want to be sure that this is processed as expected which Locale should I use?
EDIT: I’m mainly concerned about programming identifiers such as table and column names in a schema. As such I want English lower casing to apply.
Locale.ROOT states that it is the language/country neutral locale for the locale sensitive operations
Locale.ENGLISH would presumably also be a safe choice.
Yes,
Locale.ENGLISHis a safe choice for case operations for things like programming language identifiers and URL parts since it doesn’t involve any special casing rules and all 7-bit ASCII characters in the ENGLISH case-convert to 7-bit ASCII characters.That is not true for all other locales. In Turkish, the ‘I’ and ‘i’ characters are not case-converted to one another.
"Dotted and dotless I" explains:
The list of special exceptions is maintained at http://unicode.org/Public/UNIDATA/SpecialCasing.txt