I have a code such that:
a = "\u0432"
b = u"\u0432"
c = b"\u0432"
d = c.decode('utf8')
print(type(a), a)
print(type(b), b)
print(type(c), c)
print(type(d), d)
And output:
<class 'str'> в
<class 'str'> в
<class 'bytes'> b'\\u0432'
<class 'str'> \u0432
Why in the latter case I see a character code, instead of the character?
How I can transform Byte string to Unicode string that in case of an output I saw the character, instead of its code?
In strings (or Unicode objects in Python 2),
\uhas a special meaning, namely saying, “here comes a Unicode character specified by it’s Unicode ID”. Henceu"\u0432"will result in the character в.The
b''prefix tells you this is a sequence of 8-bit bytes, and bytes object has no Unicode characters, so the\ucode has no special meaning. Hence,b"\u0432"is just the sequence of the bytes\,u,0,4,3and2.Essentially you have an 8-bit string containing not a Unicode character, but the specification of a Unicode character.
You can convert this specification using the unicode escape encoder.