I’m running into a encoding issue with BeautifulSoup. I’m trying to parse Open Graph titles but it’s leaving out non-ascii characters.
from bs4 import BeautifulSoup
doc = BeautifulSoup(html,"lxml")
doc.html.head.findAll('meta',attrs={'property':'og:title'})
For http://mattilintulahti.net/mediablogi/2013/02/11/19-asiaa-joita-et-tieda-mediayhtiosta-nimeltaan-red-bull/ it prints out the following for the content
19 asiaa joita et tied mediayhtist nimeltn Red Bull
Where the correct one is
19 asiaa joita et tiedä mediayhtiöstä nimeltään Red Bull
Any advice on how to get utf-8 to works properly?
I’m not able to reproduce the problem:
yields
If this doesn’t help, please show your your code.