I have the following query in MySQL:
SELECT id FROM unicode WHERE `character` = 'a'
The table unicode contains each unicode character along with an ID (it’s integer encoding value). Since the collation of the table is set to utf8_unicode_ci, I would have expected the above query to only return 97 (the letter ‘a’). Instead, it returns 119 rows containing the IDs of many ‘a’-like letters:
a A Ã …
It seems to be ignoring both case and the multi-byte nature of the characters.
Any ideas?
As documented under Unicode Character Sets:
The full collation chart makes clear that, in this collation, most variations of a base letter are equivalent irrespective of their lettercase or accent/decoration.
If you want to only match exact letters, you should use a binary collation such as
utf8_bin.