I’m using BeautifulSoup to scrape a Swedish web page. On the web page, the information I want to extract looks like this:
"Öhman Företagsobligationsfond"
When I print the information from the Python script it looks like this:
"Öhman Företagsobligationsfond"
I’m new to Python and I have searched for answers and tried using # -- coding: utf-8 -- in the beginning of the code but it does not work.
I’m thinking of moving from Sweden to solve this issue.
When using
# -- coding: utf-8 --you only specify the encoding of the source code document. The page that you are parsing has probably declared a faulty encoding (or none at all), and therefore Beautiful Soup fails. Try to specify the encoding when building the soup. Here’s a small example:The output from this is:
In Beautiful Soup 4, the parameter is
from_encoding, while in version 3, the parameter isfromEncoding.