I’m reading a file that contains Romanian words in Python with file.readline().
I’ve got problem with many characters because of encoding.
Example :
>>> a = "aberație" #type 'str'
>>> a -> 'abera\xc8\x9bie'
>>> print sys.stdin.encoding
UTF-8
I’ve tried encode() with utf-8, cp500 etc, but it doesn’t work.
I can’t find which is the right Character encoding I have to use ?
thanks in advance.
Edit: The aim is to store the word from file in a dictionnary, and when printing it, to obtain aberație and not ‘abera\xc8\x9bie’
What are you trying to do?
This is a set of bytes:
It’s a set of bytes which represents a
utf-8encoding of the string “aberație”. You decode the bytes to get your unicode string:If you want to store the unicode string to a file, then you have to encode it to a particular byte format of your choosing: