The characters I am getting from the URL, for example http://www.mydomain.com/?name=john , were fine, as longs as they were not in Russian.
If they were are in Russian, I was getting ‘����’.
So I added $name= iconv(“cp1251″,”utf-8” ,$name); and now it works fine for Russian and English characters, but screws up other languages. :)))
For example ‘Jānis’ ( Latvian ) that worked fine before iconv, now turns into ‘jДЃnis’.
Any idea if there’s some universal encoder that would work with both the Cyrillic languages and not screw up other languages?
Actually this runs down to the problem of how the URL is encoded. If you’re clicking a link on a given page the browser will use the page’s encoding to sent the request but if you enter the URL directly into the address-bar of your browser the behavior is somehow undefined as there is no standardized way on the encoding to use (Firefox provides an
about:configswitch to use UTF-8 encoded URLs).Besides using some encoding detection there is no way to know the encoding used with the URL in the given request.
EDIT:
Just to backup what I said above, I wrote a small test script that shows the default behavior of the five major browsers (running Mac OS X in my case – Windows Vista via Parallels in case of the IE):
Calling
http://path/to/script.php?p=äöüleads toc3 a4 c3 b6 c3 bcc3 a4 c3 b6 c3 bcc3 a4 c3 b6 c3 bce4 f6 fce4 f6 fcSo obviously the first three use UTF-8 encoded URLs while Opera and IE use ISO-8859-1 or some of its variants. Conclusion: you cannot be sure what’s the encoding of textual data sent via an URL.