Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8027141
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T23:43:39+00:00 2026-06-04T23:43:39+00:00

I try to write a script to crawl my site. But I stuck to

  • 0

I try to write a script to crawl my site.
But I stuck to the line 15 at the “if statement”; It does not make a comparison.
I think it’s an encoding problem, or contain other characters. I guess.
The document encoding is ANSI and the website is ISO-8859-15.

HParser.py:

from HTMLParser import HTMLParser
from htmlentitydefs import name2codepoint
import urllib2

url = 'http://DOMAIN.TLD'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        tag = unicode(tag)
        tag = tag.strip()
        print "'",tag,"'"
        if tag == 'a':
            for attr in attrs:
                if 'src' == attr[0]:
                    print 'Link: ', attr[1]

    def handle_endtag(self, tag):
        pass

    def handle_data(self, data):
        pass

    def handle_comment(self, data):
        pass

    def handle_entityref(self, name):
        pass

    def handle_charref(self, name):
        pass

    def handle_decl(self, data):
        pass

parser = MyHTMLParser()
parser.feed(the_page)
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T23:43:40+00:00Added an answer on June 4, 2026 at 11:43 pm

    I tested your code a little using the stackoverflow main page as the url. Here are what I found:

    1) tag == 'a' evaluates to True correctly when it is ‘a’.

    2) attr prints out tuple as you expected. For example:

    ('href', 'http://creativecommons.org/licenses/by-sa/3.0/')
    ('class', 'cc-wiki-link')
    

    So what I think this means is that you just never have any tuple with the first element being ‘src’ . When I parse the main stackoverflow page, I didn’t get any tuple attr with attr[0] being ‘src’ either.

    In short, the problem is with the if condition on line 18.

    Now, I don’t know html well enough to know if the ‘src’ attribute ever goes with the <a> tag, but I usually see ‘src’ with <img> tag, and ‘href’ with the <a> tag. So you may want to change line 18 to if attr[0] == 'href' instead.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I try to write a site with CodeIgniter but I've a problem with PHP.
I try to write a script to scp a file from local server A
I am try to write some validation script using javascript and prototype. What I
I try to write a simple user script to enlarge the picture when you
I try to write :ab in Vim for faster coding but the question is
I try to write if else condition on Entity class but when i run
When I try to write multiple constructors in coffee script, i get this error:
Hey I try to write a littel bash script. This should copy a dir
I try to write a script and a problem. Can you let me know
I am try to write inside html document. this is my javascript code: <script

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.