I’ve never been sure that I understand the difference between str/unicode decode and encode.
I know that str().decode() is for when you have a string of bytes that you know has a certain character encoding, given that encoding name it will return a unicode string.
I know that unicode().encode() converts unicode chars into a string of bytes according to a given encoding name.
But I don’t understand what str().encode() and unicode().decode() are for. Can anyone explain, and possibly also correct anything else I’ve gotten wrong above?
EDIT:
Several answers give info on what .encode does on a string, but no-one seems to know what .decode does for unicode.
The
decodemethod of unicode strings really doesn’t have any applications at all (unless you have some non-text data in a unicode string for some reason — see below). It is mainly there for historical reasons, i think. In Python 3 it is completely gone.unicode().decode()will perform an implicit encoding ofsusing the default (ascii) codec. Verify this like so:The error messages are exactly the same.
For
str().encode()it’s the other way around — it attempts an implicit decoding ofswith the default encoding:Used like this,
str().encode()is also superfluous.But there is another application of the latter method that is useful: there are encodings that have nothing to do with character sets, and thus can be applied to 8-bit strings in a meaningful way:
You are right, though: the ambiguous usage of ‘encoding’ for both these applications is… awkard. Again, with separate
byteandstringtypes in Python 3, this is no longer an issue.