I am using BeautifulSoup to scrape data from a webpage. I want to compare

Question

0

Asked: May 28, 20262026-05-28T18:27:34+00:00 2026-05-28T18:27:34+00:00

I am using BeautifulSoup to scrape data from a webpage. I want to compare

0

I am using BeautifulSoup to scrape data from a webpage. I want to compare the website data with text that is in a .txt document. However, I seem to be having encoding issues.

The website has the text “heat oven to 400°” The text also appears like this in “view source” (no html entities.)

The website is read using beautifulSoup:

source = "my url".read()
....
soup = BeautifulSoup(source)

The text document was created by making a new text doc encoded as “Encode in UTF-8 without BOM”. I then copy-pasted “heat oven to 400°” from the website into the text doc and saved.

The text file is read as

f = codecs.open('myfilename', encoding='utf-8')

When I compare the two strings, they are not equal, but I want them to be.

To see what is going on: In Eclipse, I split the two texts and, looking at the variables in debug mode, I see that the degree sign from BeautifulSoup appears as \xc2 \xb0. The degree sign from the text doc just appears as \xb0.

Why, and how do I fix it? I’m having this issue with many special chars so I need a general solution. Also, I will be copy-pasting data from several sites into the text doc.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T18:27:35+00:00

Looks like Beautiful Soup doesn’t have what it needs in order to detect the encoding correctly. You can give a hint by replacing BeautifulSoup(source) with BeautifulSoup(source, fromEncoding=’UTF-8′). More options and information are online at “Beautiful Soup Gives You Unicode, Dammit“.

The bytes ‘\xc2\xb0’ are what you get when the UTF-8 encoding of Unicode code point U+00B0 is mistaken for Beautiful Soup’s last-resort guess at the encoding, which is Windows 1252.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using BeautifulSoup to scrape data from a webpage. I want to compare

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply