I’m reading a file that contains Romanian words in Python with file.readline(). I’ve got

Question

0

Asked: May 23, 20262026-05-23T12:43:04+00:00 2026-05-23T12:43:04+00:00

I’m reading a file that contains Romanian words in Python with file.readline(). I’ve got

0

I’m reading a file that contains Romanian words in Python with file.readline().
I’ve got problem with many characters because of encoding.

Example :

>>> a = "aberație"  #type 'str'
>>> a -> 'abera\xc8\x9bie'
>>> print sys.stdin.encoding
UTF-8

I’ve tried encode() with utf-8, cp500 etc, but it doesn’t work.

I can’t find which is the right Character encoding I have to use ?

thanks in advance.

Edit: The aim is to store the word from file in a dictionnary, and when printing it, to obtain aberație and not ‘abera\xc8\x9bie’

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T12:43:05+00:00

What are you trying to do?

This is a set of bytes:

BYTES = 'abera\xc8\x9bie'

It’s a set of bytes which represents a utf-8 encoding of the string “aberație”. You decode the bytes to get your unicode string:

>>> BYTES 
'abera\xc8\x9bie'
>>> print BYTES 
aberaÈ›ie
>>> abberation = BYTES.decode('utf-8')
>>> abberation 
u'abera\u021bie'
>>> print abberation 
aberație

If you want to store the unicode string to a file, then you have to encode it to a particular byte format of your choosing:

>>> abberation.encode('utf-8')
'abera\xc8\x9bie'
>>> abberation.encode('utf-16')
'\xff\xfea\x00b\x00e\x00r\x00a\x00\x1b\x02i\x00e\x00'

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m reading a file that contains Romanian words in Python with file.readline(). I’ve got

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply