I am reading a file in utf-8 into unicode and I do not get

Question

0

Asked: May 31, 20262026-05-31T16:52:23+00:00 2026-05-31T16:52:23+00:00

I am reading a file in utf-8 into unicode and I do not get

0

I am reading a file in utf-8 into unicode and I do not get any errors.

try:
        f = codecs.open(fil_name, "r","utf-8")
        f_str = f.read()

That is, the string f_str is in “unicode”
Later in the program I have to send the (u) string in f_str to a socket. I am trying to convert the string back to “utf-8”.

usock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
usock.connect(("xxx server", 123))
usock.send("TEXT %s\nENDQ\n" % f_str.replace("\n", " ").encode("utf-8"))

here I am getting an error message:

usock.send("TEXT %s\nENDQ\n" % text.replace("\n", " ").encode("utf-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 41: ordinal not in range(128)

In my text, I have characters that cannot be coded with pure ASCII (äö..) but it is not a problem with utf-8 or latin-1.
Why I am getting this error? I am not using ASCII, I am using unicode/utf-8???

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T16:52:24+00:00

Your string literal is a byte string. When you try to inperpolate into it Python will implicitly try to convert to byte string using the default encoding (ascii).

There are a couple of ways to fix this. One is just use Python 3. 😉

If you are using Python 2 then put the following at the top of the source file:

from __future__ import unicode_literals

Then your literal will be unicode also.

You could also prefix the string with a ‘u’.

Another problem with that line is precedence. The ‘%s’ format operation is what is trying to convert your unicode into a string implicitly, using the ascii codec, after the right side is complete.

So, try this:

(u"TEXT %s\nENDQ\n" % f_str.replace(u"\n", u" ")).encode("utf-8")

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am reading a file in utf-8 into unicode and I do not get

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply