I am reading a file in utf-8 into unicode and I do not get any errors.
try:
f = codecs.open(fil_name, "r","utf-8")
f_str = f.read()
That is, the string f_str is in “unicode”
Later in the program I have to send the (u) string in f_str to a socket. I am trying to convert the string back to “utf-8”.
usock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
usock.connect(("xxx server", 123))
usock.send("TEXT %s\nENDQ\n" % f_str.replace("\n", " ").encode("utf-8"))
here I am getting an error message:
usock.send("TEXT %s\nENDQ\n" % text.replace("\n", " ").encode("utf-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 41: ordinal not in range(128)
In my text, I have characters that cannot be coded with pure ASCII (äö..) but it is not a problem with utf-8 or latin-1.
Why I am getting this error? I am not using ASCII, I am using unicode/utf-8???
Your string literal is a byte string. When you try to inperpolate into it Python will implicitly try to convert to byte string using the default encoding (ascii).
There are a couple of ways to fix this. One is just use Python 3. 😉
If you are using Python 2 then put the following at the top of the source file:
Then your literal will be unicode also.
You could also prefix the string with a ‘u’.
Another problem with that line is precedence. The ‘%s’ format operation is what is trying to convert your unicode into a string implicitly, using the ascii codec, after the right side is complete.
So, try this: