Suppose I have a string that contains Ü. How would I find all those unicode characters? Should I test for their code? How would I do that?
For example, given the string “AÜXÜ”, I’d like to transform it to “AYXY”. I’d like to do the same for other unicode characters, and I would hate to have to store them in a translation map of some sort.
The definition of “unicode characters” is vague, but will be taken to mean UTF-8 characters not covered by the standard ISO 8859 charset. If this is true in your case, then loop through all characters in the String and test its codepoint to determine whether it is within the given character set.
Alternatively, use a
Map<Character, Character>and characters in the map that contain match the keys. For example:Or, do you mean “all characters with diacritics”? If so, then use
java.text.Normalizerto remove diacritical marks:One pitfall, Ü would become U, not Y. Not sure if that’s what you’re after. If you want to replace by pronounced character, you’ll really need to create a mapping. Sure, it’s a tedious work, but it’s done in less time than you needed to follow this topic.