This is the second time in a few weeks that I’ve been stuck on an encoding issue. I’ve spent such a long time on this problem already, and I’d appreciate any help I can get.
This is what I want to do:
1) Select some rows from a MySQL table on my computer.
2) Write these rows into a text file.
3) Transfer the text file over to my Amazon EC2 Ubuntu instance.
4) Write the contents of the text file into a MySQL database.
5) Get Django to select some rows from the database in #4.
6) Show on the website.
In step #1, I just had an ordinary SELECT statement.
In step #2, I did this:
file = codecs.open('commentsfordjango.txt', encoding = 'utf-8', mode='w')
file.write(fullcomment.decode('utf8') + '\n\n\n\n\n\n')
After step #2, I opened the .txt file in Windows and I could see all the actual Chinese characters without any error.
In step #3, I just transferred the file using WinSCP.
In step #4, I did this:
file = open('/usr/local/src/blog/commentsfordjango.txt', 'r')
cursor.execute("INSERT INTO polls_poll (commenttext, pos, neu, neg) VALUES (%s, 0, 0, 0)", line)
In step #5, I did this in views.py: I simply returned the object which corresponded to the model. My model has a unicode function but I did not call that as I read that by default, it is already called when you call your object.
In step #6, my HTML file has the following line at the top of the file:
<meta charset="utf-8" />
Also, I changed my Apache encoding default to Unicode. I also made sure that my SQL database in step #4 is in Unicode.
However, after all this, my website still shows a bunch of unreadable, weird characters as such: 人在åšï¼Œå¤©åœ¨çœ‹ã€.
Any help will be very much appreciated – I’ve tried so many variations involving .decode() and .encode(‘utf-8’) and spent far too long on this problem already!
In Step #2, you should to encode your text as UTF-8.
In Step #3, you can then decode the data you read from the file back into unicode.
A better solution would be to just use Django’s built-in loaddata/dumpdata facilities.