Possible Duplicate:
How do I remove diacritics (accents) from a string in .NET?
I have the following string
áéíóú
which I need to convert it to
aeiou
How can I achieve it? (I don’t need to compare, I need the new string to save)
Not a duplicate of How do I remove diacritics (accents) from a string in .NET?. The accepted answer there doesn’t explain anything and that’s why I’ve “reopened” it.
It depends on requirements. For most uses, then normalising to NFD and then filtering out all combining chars will do. For some cases, normalising to NFKD is more appropriate (if you also want to removed some further distinctions between characters).
Some other distinctions will not be caught by this, notably stroked Latin characters. There’s also no clear non-locale-specific way for some (should ł be considered equivalent to l or w?) so you may need to customise beyond this.
There are also some cases where NFD and NFKD don’t work quite as expected, to allow for consistency between Unicode versions.
Hence:
Here we’ve a default for the problem cases mentioned above, which just ignores them. We’ve also split building a string from generating the enumeration of characters so we need not be wasteful in cases where there’s no need for string manipulation on the result (say we were going to write the chars to output next, or do some further char-by-char manipulation).
An example case for something where we wanted to also convert ł and Ł to l and L, but had no other specialised concerns could use:
Using this with the above methods will combine to remove the stroke in this case, along with the decomposable diacritics.