I’m scraping a website with Python 2.7 using BeautifulSoup. Here’s my code: # -*-

Question

0

Asked: June 10, 20262026-06-10T07:08:16+00:00 2026-06-10T07:08:16+00:00

I’m scraping a website with Python 2.7 using BeautifulSoup. Here’s my code: # -*-

0

I’m scraping a website with Python 2.7 using BeautifulSoup. Here’s my code:

# -*- coding: utf-8 -*-

from BeautifulSoup import BeautifulSoup
import urllib
import json

url = 'http://www.website.com'
file_pointer = urllib.urlopen(url)
html_object = BeautifulSoup(file_pointer)

type_select = html_object('select',{'id':'which'})

for option in type_select:
    value = option('option')
    for type_value in value:
        type =  type_value.contents[0]
        param_1 = type_value['value']
        print 'Type:', type

        url2 = 'http://www/website.com/' + param_1
        file_pointer2 = urllib.urlopen(url2)
        html_object2 = BeautifulSoup(file_pointer2)
        result = json.loads(str(html_object2))

        for json1 in result['DATA']:
            category = json1[0].title()
            param_2 = json1[0]
            print '   Category:', category

            url3 = 'http://www/website.com/' + param_2 + '&which=' + param_1
            file_pointer3 = urllib.urlopen(url3)
            html_object3 = BeautifulSoup(file_pointer3)
            result2 = json.loads(str(html_object3))

            for json2 in result2['DATA']:
                sub_category = json2[0]
                param_3 = sub_category.replace(' ','+').replace('&','%26')
                print '       sub_category:', sub_category

                for i in param_3:
                    if i == 'â':
                        print i
  ...

I need to replace the 'â' character for a fourth URL request to continue my scrape, but no matter what I try to replace (u'\u2019', â, etc.), I get a UnicodeEncodeError.

I tried converting param_3 to a string (because it is a BeautifulSoup Navigable String datatype) and replacing, but I get the same error, except on my str(param_3) line. I finally tried this for-loop comparison and get the warning:

UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if i == 'â':

I’m at a loss here. How can I translate this character and replace it with other characters in param_3?

Any help is appreciated! Thanks in advance!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T07:08:18+00:00

Editorial Team

2026-06-10T07:08:18+00:00Added an answer on June 10, 2026 at 7:08 am

BeautifulSoup returns Unicode strings, so use Unicode strings when operating on them. Also check out urllib.quote_plus. It looks like it does the replacements you want. You’ll need to .encode the Unicode string before using it with quote_plus.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m scraping a website with Python 2.7 using BeautifulSoup. Here’s my code: # -*-

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply