I have been looking thoroughly through the Web and I cannot seem to find a table with those kind of conversions. The ones I find have some mistakes and are not too reliable, so I have looked for some official table or alike, but unfortunately I haven’t.. so here I am..
As mentioned in the title, what I want to do is for instance, know what does “ñ” stand for (this one I already know.. “ñ”), but not only for Spanish characters, but others (I already know the Polish ones).
Main problem is I have a string in PHP which sometimes may come as for instance “eñe” (which is ok) and others as “eñe”.. and in the lattest I should be able to change it to “eñe” so it is readable.. but if it is ok I do not want to change it. In order to do this, I was using utf8_decode function, but in case the string is readable, it will still change the “ñ” to “■” (but white).. so that is why I cannot always decode the string, and if I use the mb_detect_encoding function, I will always get “UTF-8” as a response.. and it is not so helpful..
Once I know all of the utf8 bit chars written as for instance “ñ” for “ñ”, “Ź” for “Ź”, etc., I plan to do a function which will basically replace one to another.. which is sort of the same thing that the utf8_decode does.. unless someone here has a better solution!
Thanks in advance!
Greetings!
Why do you want to do this? Do you want to recover corrupted data or so?
It should really not be done as part of usual business code flow. All you need to do is to ensure that all layers of your webapp is using UTF-8 properly. The PHP source, the HTTP response header and body, the DB table, the DB connection, et cetera. See also PHP UTF-8 cheatsheet.
If you actually want to do this as an one-time task to recover corrupted data, then it’s good to know that the corrupted data in your question indicates UTF-8 data which is incorrectly been stored or displayed as ISO-8859-1. You just need to read the data as ISO-8859-1 and write as UTF-8. One time. Then do it the right way.
As an evidence, the
ñ(Unicode Character ‘LATIN SMALL LETTER N WITH TILDE’ (U+00F1)) exist in Unicode (UTF-8, a multi-byte encoding) of bytes0xC3and0xB1. When those bytes are encoded using a single-byte encoding like ISO-8859-1, then the0xC3becomesÃand the0xB1becomes±. See also the ISO-8859-1 codepage layout.