If I have a string of UTF-8 characters and they need to be output to an older system as UTF-7 I have two questions pertaining to this.
-
How can I convert a string s which has UTF-8 characters to the same string without those characters efficiently?
-
Are there any simple of converting extended characters like ‘Ō’ to their closest non extended equivalent ‘O’?
If the older system can actually handle UTF-7 properly, why do you want to remove anything? Just encode the string as UTF-7:
Then send the UTF-7-encoded text down to the older system.
If you’ve got the original UTF-8-encoded bytes, you can do this in one step:
If you actually need to convert to ASCII, you can do this reasonably easily.
To remove the non-ASCII characters:
To convert non-ASCII to nearest equivalent: