I have a simple shopping cart web site that uses a MySQL database to store the products. There are THOUSANDS of products, and as a result these items can be managed from both a web based interface as WELL as by generating a TSV file, downloading, editing and re-uploading which then parses the now changed CSV file, making the correct changes as it goes.
Now you can imagine the nightmares I have been facing in terms of character encoding etc. My question is this: Is there a common practice, efficient way to encode – store – retrieve – unencode data for use accross CSV, MySQL and Web platform?
I am finding that the admins may enter a certain description in the CSV which is simply copied and pasted from somewhere. That description may contain special characters such as copyright and trademark symbols, and even ‘power to’ and ‘squared’ math characters.
What would be the best method to ensure that these special characters are kept intact in the database and are also able to be displayed in the web site with no worries, and when downloaded as a TSV file they are once again encoded back to a format the Excel(R) will display as the special character and not some character code.
As always, any feedback / guidance is always appreciated.
Thanks
Simply use UTF-8 in every step of the process (with an UTF-8 BOM when generating the CSV so Windows gets it) and you won’t have any problems.
Give your html files and/or server headers an UTF-8 encoding and your tables an UTF-8 encoding and everything should work without a problem.