I have a text which contains characters such as \xaf, \xbe, which, as I

Question

0

Asked: May 20, 20262026-05-20T02:06:13+00:00 2026-05-20T02:06:13+00:00

I have a text which contains characters such as \xaf, \xbe, which, as I

0

I have a text which contains characters such as “\xaf”, “\xbe”, which, as I understand it from this question, are ASCII encoded characters.

I want to convert them in Python to their UTF-8 equivalents. The usual string.encode("utf-8") throws UnicodeDecodeError. Is there some better way, e.g., with the codecs standard library?

Sample 200 characters here.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T02:06:14+00:00

Your file is already a UTF-8 encoded file.

# saved encoding-sample to /tmp/encoding-sample
import codecs
fp= codecs.open("/tmp/encoding-sample", "r", "utf8")
data= fp.read()

import unicodedata as ud

chars= sorted(set(data))
for char in chars:
    try:
        charname= ud.name(char)
    except ValueError:
        charname= "<unknown>"
    sys.stdout.write("char U%04x %s\n" % (ord(char), charname))

And manually filling in the unknown names:
char U000a LINE FEED
char U001e INFORMATION SEPARATOR TWO
char U001f INFORMATION SEPARATOR ONE

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a text which contains characters such as \xaf, \xbe, which, as I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply