I am currently writing a parser for html file generated from doc files. The strings contains symbols like alpha beta etc… the problem is when I do a urldecode(urlencode(alpha)); its not giving the symbol.. it returns something else.
To find my problem just check
urldecode("%0A%20%20If%20%3Ci%20style%3D%22mso-bidi-font-style%3Anormal%22%3E%CE%B1%3C%2Fi%3E%2C%20b%2C%20g%0A%20%20be%20the%20zeroes%20of%20the%20polynomial%20%3Ci%20style%3D%22mso-bidi-font-style%3Anormal%22%3Eax%3C%2Fi%3E%3Csup%3E3%3C%2Fsup%3E%0A%20%20%2B%20b%3Ci%20style%3D%22mso-bidi-font-style%3Anormal%22%3Ex%3C%2Fi%3E%3Csup%3E2%3C%2Fsup%3E%20%2B%20c%3Ci%20style%3D%22mso-bidi-font-style%3Anormal%22%3Ex%3C%2Fi%3E%20%2B%20d%2C%20the%20the%20value%20of%20%3Ci%20style%3D%22mso-bidi-font-style%3Anormal%22%3E%26nbsp%3B%CE%B1%3C%2Fi%3Eb%20%2B%20bg%20%2B%20g%3Ci%20style%3D%22mso-bidi-font-style%3Anormal%22%3E%20%CE%B1%3C%2Fi%3E%26nbsp%3B%20is%0A%20%20");
Is there a way to fix this?
You have a character set mismatch. The symbol is likely decoded to UTF-8, but you are interpreting the site as something else, likely Latin-1. To confirm that, choose UTF-8 from your browser’s View > Encoding menu. Set an appropriate header so the site is always interpreted using UTF-8:
This means you also need to make sure the rest of your site is valid UTF-8, or otherwise match the encoding of your text.