I can't find the documentation at the moment, but there…

Question

0

Asked: May 13, 20262026-05-13T16:56:24+00:00 2026-05-13T16:56:24+00:00

This may not really be a Python related question, but pertains to language encoding

0

This may not really be a Python related question, but pertains to language encoding in general. I’m mining tweets from Twitter, and it appears that there is a large Japanese user community (with messages in Japanese). When I tried encoding the tweets for an XML file I used utf-8. e.g tweet=tweet.encode(‘utf-8’) and none of the Japanese tweets appeared as they should have. My question that I am posing is, how should I have encoded them? What was my mistake? If I was to store the data in a CSV, what encoding scheme would I use in that case?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T16:56:24+00:00

Editorial Team

2026-05-13T16:56:24+00:00Added an answer on May 13, 2026 at 4:56 pm

Normally you would query the format for what encoding the data is in. Having said that, Shift-JIS is quite a popular encoding for Japanese text.

>>> u'あいうえお'.encode('shift-jis')
'\x82\xa0\x82\xa2\x82\xa4\x82\xa6\x82\xa8'

0

Reply
Share
Share

- Report

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions