I’ve written a PHP script to populate a MySQL table with Unicode data. I’ve run into a few minor problems, though. For example, the character column for the space character (id = 32) is empty, even when I run the following command separately:
UPDATE unicode SET `character` = ' ' WHERE id = 32;
Any ideas? Here’s the script that I’m using to populate the table (the included script common.php defines the database function, which is a PDO object):
<?php
include_once('common.php');
// Fetch data from Unicode website
$file = fopen('http://www.unicode.org/Public/UNIDATA/UnicodeData.txt', 'r');
// Iterate through each line of the file
while($row = fgets($file)) {
// Gather data
$column = explode(';', $row);
$id = hexdec($column[0]);
$name = $column[1];
$general_category = $column[2];
$uppercase_mapping = hexdec($column[12]);
$lowercase_mapping = hexdec($column[13]);
$titlecase_mapping = hexdec($column[14]);
// Build the database query
$query = sprintf("INSERT IGNORE INTO unicode VALUES (%d, CHAR(%d USING UTF8), '%s', '%s', %d, %d, %d)",
$id,
$id,
$name,
$general_category,
$uppercase_mapping,
$lowercase_mapping,
$titlecase_mapping);
database()->query($query);
echo $id.' ';
}
?>
If you have a
CHARcolumn type your database driver may be stripping off trailing spaces automatically. Some drivers may be doing this even onVARCHARdata if they are not configured to preserve them.You can check what’s actually in the database by selecting out a hex-encoded version:
You should be seeing
20, the hexadecimal equivalent of 32.