Duplicate of 249087
I have a bunch of user generated addresses that may contain characters with diacritic marks. What is the most effective (i.e. generic) way (apart from a straightforward replace) to automatically convert any such characters to their closest English equivalent?
E.g. any of àâãäå would become a
æ would become the two separate letters ae
ç would become c
any of èéêë would become e
etc. for all possible letter variations (preferably without having to find and encode lookups for each diacritic form of the letter).
(Note: I have to pass these addresses on to third party software that is incapable of printing anything other than English characters. I’d rather the software was capable of handling them, but I have no control over that.)
EDIT: Never mind… Found the answer [here][2]. It showed up in the ‘Related’ section to the right of the question after I posted, but not in my prior search or as a pre-post suggestion. Hmm. I added the ‘diacritics’ tag to the other question in any case.
EDIT 2: Jeez! Who voted this -1 after I closed it?
Just was going to post the same link 🙂
Sounds like you’re doing this already, but I would recommend that you store the original string for display in your application, and only do this for the 3rd-party stuff. People get cranky if they don’t think their real name is important 🙂