Among all the encodings available here http://docs.python.org/library/codecs.html
which one is the one I should use for decoding binary data into unicode without it becoming corrupted when I encode it back to string?
I’ve used raw_unicode_data and it doesn’t work.
Example: I upload picture in a POST (but not as file attachment). Django converts POST data to unicode using utf-8. However when converting back from unicode to string (again using utf-8), data becomes corrupted. I used raw_unicode_data and the same happened (though only a few bytes this time). Which encoding should I use so that the decode and encode steps don’t corrupt the data.
“Binary data” is not text, therefore converting it to a
unicodeis meaningless. If there is text embedded in the binary data then extract it first and decode using the encoding given in the specification for the data format.