Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3491888
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T11:41:30+00:00 2026-05-18T11:41:30+00:00

I’m still a newcomer to python, so I hope this question isn’t inane. The

  • 0

I’m still a newcomer to python, so I hope this question isn’t inane.

The more I google for web scraping solutions, the more confused I become (unable to see a forest, despite investigating many trees..)

I’ve been reading documentation on a number of projects, including (but not limited to)
scrapy
mechanize
spynner

but I can’t really figure out which hammer I should be trying to use..

There is a specific page i’m trying to crawl (www.schooldigger.com)
It uses asp, and there’s some java script I need to be able to emulate.

I’m aware this sort of problem isn’t easily dealt with, so I’d love any guidance.

In addition to some general discussion of the options available (and the relationships between different projects, if possible) i have a couple of specific questions

  1. When using scrapy, is there any way to avoid defining the ‘items’ to be parsed, and just download the first couple hundred pages or so? I don’t actually want to download entire websites, but, I would like to be able to see which pages are being downloaded while developing the scraper.

  2. mechanize, asp and javascript, please see a question I posted but havent seen any answers to,
    https://stackoverflow.com/questions/4249513/emulating-js-in-mechanize

  3. Why not build some sort of utility (either a turbogears application or a browser plug in) that allows a user to select links to follow and items to parse graphically? All i’m suggesting is some sort of gui to sit around a parsing API. I don’t know if I have the technical knowledge to create such a project, but I dont see why it isn’t possible, in fact, it seems rather feasible given what I know about python. Maybe some feedback about what problems this sort of project would face?

  4. Most importantly, are all web crawlers built ‘site specific’? It seems to me that I’m sort of reinventing the wheel in my code.. (but thats probably because I’m not very good at programming)

  5. Anyone have any examples of fully-featured scrapers? There are lots of examples in the documentation, (which ive been studying), but they all seem to focus on simplicity, just for the exposition of package usage, maybe I’d benefit from a more detailed/ complicated example.

thanks for your thoughts.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T11:41:30+00:00Added an answer on May 18, 2026 at 11:41 am

    For full browser interaction you are best to look at using Selenium-RC

    This has a python driver and you can script a browser to “test” just about any site on the internet

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

No related questions found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.