How to get cyrillic string from document? I have fallowing code: import urllib from

Question

0

Editorial Team

Asked: May 20, 20262026-05-20T06:39:05+00:00 2026-05-20T06:39:05+00:00

How to get cyrillic string from document? I have fallowing code: import urllib from

0

How to get cyrillic string from document?

I have fallowing code:

import urllib
from BeautifulSoup import BeautifulSoup

page = urllib.urlopen("http://habrahabr.ru/")
soup = BeautifulSoup(page.read())
for topic in soup.findAll(True, 'topic'):
    print topic
    print
raw_input()

There is cyrillic words on the site but python displays wrong characters.

I will be very helpful for any help in this issue.

PS.

I changed

soup = BeautifulSoup(page.read())

to

soup = BeautifulSoup(page.read(), fromEncoding="utf-8")

and still no results…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T06:39:06+00:00

Editorial Team

2026-05-20T06:39:06+00:00Added an answer on May 20, 2026 at 6:39 am

The data on the HTML page is encoded in UTF-8. It appears that you are printing it to your console, where sys.stdout.encoding is cp1251. That accounts for the rubbish that you are seeing.

Here are the results of inspecting the first 8 bytes of the first topic, using IDLE:

>>> raw = '\xd0\x90\xd0\xbb\xd0\xb3\xd0\xbe'
>>> print raw.decode('utf8')
Алго
>>> print raw.decode('cp1251')
РђР»РіРѕ
>>>

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

How to get cyrillic string from document? I have fallowing code: import urllib from

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply