Consider the next example: >>> s = uбаба >>> s u’\xe1\xe0\xe1\xe0′ >>> print s

Question

0

Asked: May 28, 20262026-05-28T06:58:47+00:00 2026-05-28T06:58:47+00:00

Consider the next example: >>> s = uбаба >>> s u’\xe1\xe0\xe1\xe0′ >>> print s

0

Consider the next example:

>>> s = u"баба"
>>> s
u'\xe1\xe0\xe1\xe0'
>>> print s
áàáà

I’m using cp1251 encoding within the idle, but it seems like the interpreter actually uses latin1 to create unicode string:

>>> print s.encode('latin1')
баба

Why so? Is there spec for such behavior?

CPython, 2.7.

Edit

The code I was actually looking for is

>>> u'\xe1\xe0\xe1\xe0' == u'\u00e1\u00e0\u00e1\u00e0'
True

Seems like when encoding unicode with latin1 codec, all unicode points less that 256 are simply left as is thus resulting in bytes which I typed in before.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T06:58:48+00:00

When you type a character such as б into the terminal, you see a б, but what is really inputted is a sequence of bytes.

Since your terminal encoding is cp1251, typing баба results in the sequence of bytes equal to the unicode баба encoded in cp1251:

In [219]: "баба".decode('utf-8').encode('cp1251')
Out[219]: '\xe1\xe0\xe1\xe0'

(Note I use utf-8 above because my terminal encoding is utf-8, not cp1251. For me, "баба".decode('utf-8') is just unicode for баба.)

Since typing баба results in the sequence of bytes \xe1\xe0\xe1\xe0, when you type u"баба" into the terminal, Python receives u'\xe1\xe0\xe1\xe0' instead. This is why you are seeing

>>> s
u'\xe1\xe0\xe1\xe0'

This unicode happens to represent áàáà.

And when you type

>>> print s.encode('latin1')

the latin1 encoding converts u'\xe1\xe0\xe1\xe0' to '\xe1\xe0\xe1\xe0'.
The terminal receives the sequence of bytes '\xe1\xe0\xe1\xe0', and decodes them with cp1251, thus printing баба:

In [222]: print('\xe1\xe0\xe1\xe0'.decode('cp1251'))
баба

Try:

>>> s = "баба"

(without the u) instead. Or,

>>> s = "баба".decode('cp1251')

to make s unicode. Or, use the verbose but very explicit (and terminal-encoding agnostic):

>>> s = u'\N{CYRILLIC SMALL LETTER BE}\N{CYRILLIC SMALL LETTER A}\N{CYRILLIC SMALL LETTER BE}\N{CYRILLIC SMALL LETTER A}'

Or the short but less-readily comprehensible

>>> s = u'\u0431\u0430\u0431\u0430'

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Consider the next example: >>> s = uбаба >>> s u’\xe1\xe0\xe1\xe0′ >>> print s

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply