I’ve encountered this problem a few times, and now I finally decided to ask, hoping someone knows what I’m talking about.
What I wish to do is this form of char convertion:
ÆØÅ => AOA
ÉÈÊ => EEE
üÿï => uyi
So far the closest I’ve come to a search criteria I can type into google as this:
This did not work as expected. There seemed to be no correlation between ÉÈÊ and EEE any different from that and ÆØÅ. So, held up against E, all six chars would’ve been converted to E, which wasn’t the accuracy I was looking for.
- Convertion from the origin encoding (e.g. ASCII) to a charset/encoding consiting of only alphanumerics
I’m not very confident about this approach as the encoding would have to be able to recognize, say E, as an ancestor/nearest (alphanumeric) neighbour of È.
I feel like I’m saying a lot of words which are around the ballpark.
Does anyone understand what I’m trying to achieve, or know what this “method” I’m looking for is called?
Any ideas/thoughts are very much appreciates (and I do mean any),
- Mik
I suspect you’d have to consider a database of Unicode codepoints, mapping them to their nearest US-ASCII equivalent (where possible). I imagine it would be a relatively sparse map, since most Unicode codepoints don’t have a US-ASCII equivalent.
Hopefully this answer has some key words in that help you look for what you want.