I think you might be better off using formsets here.…

Question

0

Asked: May 11, 20262026-05-11T12:55:19+00:00 2026-05-11T12:55:19+00:00

The following Python code uses BeautifulStoneSoup to fetch the LibraryThing API information for Tolkien’s

0

The following Python code uses BeautifulStoneSoup to fetch the LibraryThing API information for Tolkien’s ‘The Children of Húrin’.

import urllib2  from BeautifulSoup import BeautifulStoneSoup  URL = ('http://www.librarything.com/services/rest/1.0/'             '?method=librarything.ck.getwork&id=1907912'             '&apikey=2a2e596b887f554db2bbbf3b07ff812a')  soup = BeautifulStoneSoup(urllib2.urlopen(URL),                           convertEntities=BeautifulStoneSoup.ALL_ENTITIES) title_field = soup.find('field', attrs={'name': 'canonicaltitle'}) print title_field.find('fact').string

Unfortunately, instead of ‘Húrin’, it prints out ‘HÃºrin’. This is obviously an encoding issue, but I can’t work out what I need to do to get the expected output. Help would be greatly appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T12:55:20+00:00

2026-05-11T12:55:20+00:00Added an answer on May 11, 2026 at 12:55 pm

In the source of the web page it looks like this: The Children of HÃºrin. So the encoding is already broken somewhere on their side before it even gets converted to XML…

If it’s a general issue with all the books and you need to work around it, this seems to work:

unicode(title_field.find('fact').string).encode('latin1').decode('utf-8')

0

Reply
Share
Share

- Report

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions