I’m using beautifulSoup to scrape a page that has a ISO-8859-1 encoding however I’ve run into my little hiccup.
I have a line that reads:
logging.info("Processing [%s]" % (link))
The variable link is one of the values scraped from beautifulsoup. It is a Unicode string and I can print it by typing print link. It shows up on the console exactly the way it was scraped but the line above throws this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 14: ordinal not in range(128)
I’ve read up on Unicode right now but I can’t figure out why it is able to print it but it can’t log it.
The string in question is this:
booba-concert-à-bercy
Any ideas on where I’m mucking this up?
Thank you.
I managed to solve this by adding a file called
sitecustomize.pyin myPython/Lib/site-packagesdirectory. This file contained two lines:import sysandsys.setdefaultencoding('utf-8').The default encoding prior to that was
asciiand therefore the issues. Now I don’t need to specify an explicit encoding for the link variable as it uses the default encoding i.e.utf-8and converts it to that.Of course, I’ll never see the characters properly until my terminal in the same encoding but that won’t break my code.