Is there a collation type which is officially recommended by MySQL, for a general website where you aren’t 100% sure of what will be entered? I understand that all the encodings should be the same, such as MySQL, Apache, the HTML and anything inside PHP.
In the past I have set PHP to output in "UTF-8", but which collation does this match in MySQL? I’m thinking it’s one of the UTF-8 ones, but I have used utf8_unicode_ci, utf8_general_ci, and utf8_bin before, and I do not know which of these "utf8" maps to, or if that is the best to use.
The main difference is sorting accuracy (when comparing characters in the language) and performance. The only special one is utf8_bin which is for comparing characters in binary format.
utf8_general_ciis somewhat faster thanutf8_unicode_ci, but less accurate (for sorting). The specific language utf8 encoding (such asutf8_swedish_ci) contain additional language rules that make them the most accurate to sort for those languages. Most of the time I useutf8_unicode_ci(I prefer accuracy to small performance improvements), unless I have a good reason to prefer a specific language.You can read more on specific unicode character sets on the MySQL manual – http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html