I fetched a web page, which contains Japanese, but when I print it to the console I didn’t get the output as 7月10日. Instead, it prints: 7\xe6\x9c\x8810\xe6\x97\xa5
What should I do?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
The output you get is correct. That is the UTF-8 representation of the japanese string. The problem is the console itself that it doesn’t understand UTF-8. If you write that string in a file and open it with an editor that does understand UTF-8 you’ll see the content as you would expect. You could also try to change the console’s encoding to UTF-8.
Edit: You could also try something along:
But whether this works depends on the whether the console encoding supports japanese characters. If for example the console’s encoding is ‘ISO Latin-1’ than it won’t work…
I suggest you read: http://www.joelonsoftware.com/articles/Unicode.html