So I have programmed a crawler to scrape information and data from a website with charset utf8. But when I tried to store the contents into MySQL, some special characters, such as Spanish letters), did not show correctly in MySQL.
Here is what I have done:
- Put
header("Content-Type: text/html; charset=utf-8")in PHP - Set all charset in MySQL into
utf8-unicode-ci - Have
$conn->query("SET NAMES 'utf8'")this upon connection - Double checked that the html I parsed was encoded in utf-8
So what are some potentially problems here?
Maybe you coded your crawler using functions which are not supposed to manage multi-byte characters.
For example strlen instead of mb_strlen.
Try putting:
as first line of your php coce, and then check if you have to convert some functions in their respective mb version.
Have a look at multibyte string reference
As a last chance you may play with iconv function just before inserting the string into mysql.
Something as:
should do the trick