I am currrently using BeautifulSoup to scrape some websites, however I have a problem with some specific characters, the code inside UnicodeDammit seems to indicate this (again) are some Microsoft-invented ones.
I’m using the newest version of BeautifulSoup(3.0.8.1) as I am still using python2.5
The following code illustrates my problem:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('...Baby One More Time (Digital Deluxe Version…')
print soup
'...Baby One More Time (Digital Deluxe Version…'
As you can see the problem is the ‘…'(&hellip) character at the end (which your browser probably escaped correctly). Obviously that’s not what I am interested in.
It would be nice to have this characters unicode representation or something.
Even sinmply ignoring it would solve my particular problem.
How can I do this with BeautifulSoup?
Found the solution myself: