I have a set of UTF-8 octets and I need to convert them back to unicode code points. How can I do this in python.
e.g. UTF-8 octet [‘0xc5′,’0x81’] should be converted to 0x141 codepoint.
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Python 3.x:
In Python 3.x,
stris the class for Unicode text, andbytesis for containing octets.If by “octets” you really mean strings in the form ‘0xc5’ (rather than ‘\xc5’) you can convert to
byteslike this:You can then convert to
str(ie: Unicode) using thestrconstructor……or by calling
.decode('utf-8')on thebytesobject:Pre-3.x:
Prior to 3.x, the
strtype was a byte array, andunicodewas for Unicode text.Again, if by “octets” you really mean strings in the form ‘0xc5’ (rather than ‘\xc5’) you can convert them like this:
You can then convert to
unicodeusing the constructor……or by calling
.decode('utf-8')on thestr: