I have a string with Unicode and ASCII characters.
I can use utf8_decode to convert ASCII to Unicode characters, but it also converts unicode to unicode characters. How can I filter or convert only ASCII characters to Unicode in a mixed string?
For example:
utf8_decode(& #225; rỉ);
~> á rỉ
Two things. ASCII characters are 7-bit, 0x00 to 0x7F. So if you have a Unicode string, the ASCII characters don’t need to be converted, because they are the same in Unicode…
Now, your á is 0xE1, thus it’s not ASCII but
ISO Latin 1. And you can’t have two encodings in one string (or you’re up shit creek….). So what you need is to convert from ISO Latin 1 to UTF-8.