I have been struggling with encoding problems in MySQL for a while. I am building a database that will contain not only Latin but Cyrillic and Arabic text as well. So here is an example on how I create the database:
CREATE DATABASE db1
DEFAULT CHARACTER SET utf8
COLLATE utf8_unicode_ci;
Then a table:
CREATE TABLE TempTb1
(
ID INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
arabic VARCHAR(100) NOT NULL
)
DEFAULT CHARACTER SET utf8
COLLATE utf8_unicode_ci;
And when I put some data and select it I get only some strange characters. So I wrote a small PHP script to test it but it doesn’t work either:
<?php
header('Content-type: text/plain; charset=utf-8');
$a = mysql_connect('localhost','root','') or die('Problem connecting to database!');
$b = mysql_select_db('db1') or die('Problem selecting database');
mysql_set_charset('utf8');
mysql_query("set names 'utf8'");
mysql_query('set character set utf8');
$query = mysql_query("SELECT * FROM Tb1;");
while($row = mysql_fetch_assoc($query))
{
$id = $row['ID'];
$name = $row['name'];
$arabic = $row['arabic'];
echo $id.' '.$name.' '.$arabic.PHP_EOL;
}
?>
I have tested with both utf8_unicode_ci and utf8_general_ci. What could be wrong? BTW I have EasyPHP 5.2.10.
Whatever happens to your characters, happens before they reach to MySQL, I guess. Characters are converted to numbers by the computer when we enter the characters. Then these numbers travel from here to there, between web forms and servers, web servers and scripting interpreters, then database servers and back to web pages following the same way.
Where and how you enter your data? Data should exit the way it entered. If your data is provided via web forms, check your web page encodings and how you submit forms. How you get them in your PHP scripts and how you send them to database server. The guilty part here is probably not MySQL but another place. It can be MySQL too; but it is not the only place of possible misbehavior and it probably is not.
Check your pages, check headers as they arrive to your browser.
About comments your question received, no it is not good to use ISO5 because you need multiple of ISO5 families. You must go with a Unicode encoding, for most of the time, the best being utf-8. Also, this is not about which MySQL library you use unless that library has some known bugs which is very unlikely for something that old. 🙂 You should still use whatever recommended as best practices; but your current problem is not related to the library you use. The evil is at the difference between how you enter your data and how you view it.