I’m trying to use urllib and urllib2 to read from a text file that

Question

0

Asked: May 29, 20262026-05-29T18:53:34+00:00 2026-05-29T18:53:34+00:00

I’m trying to use urllib and urllib2 to read from a text file that

0

I’m trying to use urllib and urllib2 to read from a text file that has french characters in it, like “é”, “à”, and so on.

def load(url):
     from urllib2 import Request, urlopen, URLError, HTTPError

     req = Request(url)

     f = urlopen(req)
     f.readline()

     for line in f:
          line = line.split('\t')
          word = line[0].encode('utf-8')

I have a feeling that the read() method returns me a byte string, so I use encode(‘utf-8’) to get the unicode value, but this gives me the following error

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 6: ordinal not in range(128)

Can someone tell me what’s going on? Any help would be appreciated. Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T18:53:35+00:00

Yes, you’re reading bytes from the file. What you must do is decode, not encode, the byte string into Unicode. It’s already encoded, you see. If it wasn’t, you wouldn’t need to do anything with it.

word = unicode(line[0], "utf8")

You have to specify the encoding used in the file. If it’s not utf8, another good suspect might be latin1. Or, you know, since it’s a Web document, you could fish the document’s encoding out of the headers and/or its content, but that’s a little beyond the scope of your question.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to use urllib and urllib2 to read from a text file that

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply