I’m loading web-page using urllib. Ther eis russian symbols, but page encoding is ‘utf-8’

Question

0

Editorial Team

Asked: May 15, 20262026-05-15T00:13:50+00:00 2026-05-15T00:13:50+00:00

I’m loading web-page using urllib. Ther eis russian symbols, but page encoding is ‘utf-8’

0

I’m loading web-page using urllib. Ther eis russian symbols, but page encoding is ‘utf-8’

1

pageData = unicode(requestHandler.read()).decode('utf-8')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 262: ordinal not in range(128)

2

pageData = requestHandler.read()
soupHandler = BeautifulSoup(pageData)
print soupHandler.findAll(...)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 340-345: ordinal not in range(128)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T00:13:51+00:00

In your first snippet, the call unicode(requestHandler.read()) tells Python to convert the bytestring returned by read into unicode: since no code is specified for the conversion, ascii gets tried (and fails). It never gets to the point where you’re going to call .decode (which would make no sense to call on that unicode object anyway).

Either use unicode(requestHandler.read(), 'utf-8'), or requestHandler.read().decode('utf-8'): either of these should produce a correct unicode object if the encoding is indeed utf-8 (the presence of that D0 byte suggests it may not be, but it’s impossible to guess from being shown a single non-ascii character out of context).

printing Unicode data is a different issue and requires a well configured and cooperative terminal emulator — one that lets Python set sys.stdout.encoding on startup. For example, on a Mac, using Apple’s Terminal.App:

>>> sys.stdout.encoding
'UTF-8'

so the printing of Unicode objects works fine here:

>>> print u'\xabutf8\xbb'
«utf8»

as does the printing of utf8-encoded byte strings:

>>> print u'\xabutf8\xbb'.encode('utf8')
«utf8»

but on other machines only the latter will work (using the terminal emulator’s own encoding, which you need to discover on your own because the terminal emulator isn’t telling Python;-).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m loading web-page using urllib. Ther eis russian symbols, but page encoding is ‘utf-8’

1

2

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply