Escape the character by replacing it with <

Question

0

Asked: May 11, 20262026-05-11T07:06:34+00:00 2026-05-11T07:06:34+00:00

I’ve never been sure that I understand the difference between str/unicode decode and encode.

0

I’ve never been sure that I understand the difference between str/unicode decode and encode.

I know that str().decode() is for when you have a string of bytes that you know has a certain character encoding, given that encoding name it will return a unicode string.

I know that unicode().encode() converts unicode chars into a string of bytes according to a given encoding name.

But I don’t understand what str().encode() and unicode().decode() are for. Can anyone explain, and possibly also correct anything else I’ve gotten wrong above?

EDIT:

Several answers give info on what .encode does on a string, but no-one seems to know what .decode does for unicode.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T07:06:35+00:00

The decode method of unicode strings really doesn’t have any applications at all (unless you have some non-text data in a unicode string for some reason — see below). It is mainly there for historical reasons, i think. In Python 3 it is completely gone.

unicode().decode() will perform an implicit encoding of s using the default (ascii) codec. Verify this like so:

>>> s = u'ö' >>> s.decode() Traceback (most recent call last):   File '<stdin>', line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0: ordinal not in range(128)  >>> s.encode('ascii') Traceback (most recent call last):   File '<stdin>', line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0: ordinal not in range(128)

The error messages are exactly the same.

For str().encode() it’s the other way around — it attempts an implicit decoding of s with the default encoding:

>>> s = 'ö' >>> s.decode('utf-8') u'\xf6' >>> s.encode() Traceback (most recent call last):   File '<stdin>', line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Used like this, str().encode() is also superfluous.

But there is another application of the latter method that is useful: there are encodings that have nothing to do with character sets, and thus can be applied to 8-bit strings in a meaningful way:

>>> s.encode('zip') 'x\x9c;\xbc\r\x00\x02>\x01z'

You are right, though: the ambiguous usage of ‘encoding’ for both these applications is… awkard. Again, with separate byte and string types in Python 3, this is no longer an issue.

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve never been sure that I understand the difference between str/unicode decode and encode.

Leave an answerCancel reply

1 Answer

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Leave an answer
Cancel reply