It is unfortunate that I am running into some troubles with the php sax parser and the utf-8 encoding.
The case:
I have a xml-file that is encoded in utf-8. The file is parsed using the standard php sax parser. The data is stored into some container objects and inserted into a mysql database. Unfortunately some characters look weird in the database (mostly german umlaute). For example Gürtel looks like Gürtel.
The following code fragment shows how the parser is instantiated:
$saxParser = xml_parser_create("UTF-8");
Does this suffice to parse utf-8 files? If yes, what I am missing? Some sepcial database stuff when inserting?
Thanks in advance.
Check the encoding step by step to find the invalid code:
When printing the values, make sure your browser reads the output with the correct encoding.
You have to ensure that every component uses the proper encoding:
PHP script
Save your PHP with the encoding set to UTF-8 without BOM, because this might cause problems. Use only multibyte string functions when working with UTF-8 strings.
XML file
XML file starts with
<?xml version="1.0" encoding="UTF-8" ?>and the file is properly saved with the encoding set to UTF-8.
SQL column (collation)
Communication between MySQL server and PHP script
Run this command right after opening the connection to the MySQL server: