I’m having some trouble with a string that comes from a webpage having foreign characters in it.
The string is generated by parsing the webpage using str_get_html(), followed by $htmldom->innertext; (simple_html_dom class library).
When I output the string using htmlentities() it is displayed fine; but using explode() on the string and printing the parts, I get a tilted block with a question mark in it for each foreign character.
I need to store the string in a utf8 MySQL database, so I need the right foreign characters.
My page has a header with utf8 character set.
I have already tried mb_split() and preg_split(), but those have the same problem.
I solved the issue with :
https://github.com/neitanod/forceutf8
It has a great function that just converts anything to utf-8, no matter what source it’s from (as long as it comes in Latin1 (iso 8859-1), Windows-1252 or UTF8 already, or a mix of them).
Many thanks go to Sebastian Grignoli.