This may not really be a Python related question, but pertains to language encoding in general. I’m mining tweets from Twitter, and it appears that there is a large Japanese user community (with messages in Japanese). When I tried encoding the tweets for an XML file I used utf-8. e.g tweet=tweet.encode(‘utf-8’) and none of the Japanese tweets appeared as they should have. My question that I am posing is, how should I have encoded them? What was my mistake? If I was to store the data in a CSV, what encoding scheme would I use in that case?
This may not really be a Python related question, but pertains to language encoding
Share
Normally you would query the format for what encoding the data is in. Having said that, Shift-JIS is quite a popular encoding for Japanese text.