I need to test if a string is Unicode, and then if it whether

Question

0

Asked: June 9, 20262026-06-09T22:33:07+00:00 2026-06-09T22:33:07+00:00

I need to test if a string is Unicode, and then if it whether

0

I need to test if a string is Unicode, and then if it whether it’s UTF-8. After that, get the string’s length in bytes including the BOM, if it ever uses that. How can this be done in Python?

Also for didactic purposes, what does a byte list representation of a UTF-8 string look like? I am curious how a UTF-8 string is represented in Python.

Latter edit: pprint does that pretty well.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T22:33:08+00:00

try:
    string.decode('utf-8')
    print "string is UTF-8, length %d bytes" % len(string)
except UnicodeError:
    print "string is not UTF-8"

In Python 2, str is a sequence of bytes and unicode is a sequence of characters. You use str.decode to decode a byte sequence to unicode, and unicode.encode to encode a sequence of characters to str. So for example, u"é" is the unicode string containing the single character U+00E9 and can also be written u"\xe9"; encoding into UTF-8 gives the byte sequence "\xc3\xa9".

In Python 3, this is changed; bytes is a sequence of bytes and str is a sequence of characters.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to test if a string is Unicode, and then if it whether

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply