Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6716759
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T08:46:19+00:00 2026-05-26T08:46:19+00:00

I am trying to check if a certain word is on a page for

  • 0

I am trying to check if a certain word is on a page for many sites. The script runs fine for say 15 sites and then it stops.

UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0x96 in position 15344: invalid start byte

I did a search on stackoverflow and found many issues on it but I can’t seem to understand what went wrong in my case.

I would like to either solve it or if there is an error skip that site. Pls advice how I can do this as I am new and the below code itself has taken me a day to write. By the way the site which the script halted on was http://www.homestead.com

filetocheck = open("bloglistforcommenting","r")
resultfile = open("finalfile","w")

for countofsites in filetocheck.readlines():
        sitename = countofsites.strip()
        htmlfile = urllib.urlopen(sitename)
        page = htmlfile.read().decode('utf8')
        match = re.search("Enter your name", page)
        if match:
            print "match found  : " + sitename
            resultfile.write(sitename+"\n")

        else:
            print "sorry did not find the pattern " +sitename

print "Finished Operations"

As per Mark’s comments I changed the code to implement beautifulsoup

htmlfile = urllib.urlopen("http://www.homestead.com")
page = BeautifulSoup((''.join(htmlfile)))
print page.prettify() 

now I am getting this error

page = BeautifulSoup((''.join(htmlfile)))
TypeError: 'module' object is not callable

I am trying their quick start example from http://www.crummy.com/software/BeautifulSoup/documentation.html#Quick%20Start. If I copy paste it then the code works fine.

I FINALLY got it to work. Thank you all for your help. Here is the final code.

import urllib
import re
from BeautifulSoup import BeautifulSoup

filetocheck = open("listfile","r")

resultfile = open("finalfile","w")
error ="for errors"

for countofsites in filetocheck.readlines():
        sitename = countofsites.strip()
        htmlfile = urllib.urlopen(sitename)
        page = BeautifulSoup((''.join(htmlfile)))  
        pagetwo =str(page) 
        match = re.search("Enter YourName", pagetwo)
        if match:
            print "match found  : " + sitename
            resultfile.write(sitename+"\n")

        else:
            print "sorry did not find the pattern " +sitename

print "Finished Operations"
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T08:46:20+00:00Added an answer on May 26, 2026 at 8:46 am

    Many web pages are encoded incorrectly. For parsing HTML try BeautifulSoup as it can handle many types of incorrect HTML that are found in the wild.

    Beautiful Soup is a Python HTML/XML parser designed for quick
    turnaround projects like screen-scraping. Three features make it
    powerful:

    1. Beautiful Soup won’t choke if you give it bad markup. It yields a
      parse tree that makes approximately as much sense as your original
      document. This is usually good enough to collect the data you need and
      run away.

    2. Beautiful Soup provides a few simple methods and Pythonic
      idioms for navigating, searching, and modifying a parse tree: a
      toolkit for dissecting a document and extracting what you need. You
      don’t have to create a custom parser for each application.

    3. Beautiful
      Soup automatically converts incoming documents to Unicode and outgoing
      documents to UTF-8. You don’t have to think about encodings, unless
      the document doesn’t specify an encoding and Beautiful Soup can’t
      autodetect one. Then you just have to specify the original encoding.

    Emphasis mine.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to check how many times certain lines are executed in few timesteps,
I use MySQL 5.5.16. I'm trying to check if a certain column exists in
I'm trying to check if a window has a certain style using GetWindowLong(hWnd, GWL_STYLE)
I'm trying to check if an object can cast to a certain type using
I am trying to check if a certain hash element exists. I have two
I'm trying to check if a certain category is allready selected by looping through
I'm trying to check is any item of a list starts with a certain
I'm trying to make a loop to check if certain files exist, and if
I am trying to check whether a certain domain is live or not. My
I'm trying to check if a certain node has a property footerTextTitle by: @foreach

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.