I’m still a newcomer to python, so I hope this question isn’t inane. The

Question

0

Asked: May 18, 20262026-05-18T11:41:30+00:00 2026-05-18T11:41:30+00:00

I’m still a newcomer to python, so I hope this question isn’t inane. The

0

I’m still a newcomer to python, so I hope this question isn’t inane.

The more I google for web scraping solutions, the more confused I become (unable to see a forest, despite investigating many trees..)

I’ve been reading documentation on a number of projects, including (but not limited to)
scrapy
mechanize
spynner

but I can’t really figure out which hammer I should be trying to use..

There is a specific page i’m trying to crawl (www.schooldigger.com)
It uses asp, and there’s some java script I need to be able to emulate.

I’m aware this sort of problem isn’t easily dealt with, so I’d love any guidance.

In addition to some general discussion of the options available (and the relationships between different projects, if possible) i have a couple of specific questions

When using scrapy, is there any way to avoid defining the ‘items’ to be parsed, and just download the first couple hundred pages or so? I don’t actually want to download entire websites, but, I would like to be able to see which pages are being downloaded while developing the scraper.
mechanize, asp and javascript, please see a question I posted but havent seen any answers to,
https://stackoverflow.com/questions/4249513/emulating-js-in-mechanize
Why not build some sort of utility (either a turbogears application or a browser plug in) that allows a user to select links to follow and items to parse graphically? All i’m suggesting is some sort of gui to sit around a parsing API. I don’t know if I have the technical knowledge to create such a project, but I dont see why it isn’t possible, in fact, it seems rather feasible given what I know about python. Maybe some feedback about what problems this sort of project would face?
Most importantly, are all web crawlers built ‘site specific’? It seems to me that I’m sort of reinventing the wheel in my code.. (but thats probably because I’m not very good at programming)
Anyone have any examples of fully-featured scrapers? There are lots of examples in the documentation, (which ive been studying), but they all seem to focus on simplicity, just for the exposition of package usage, maybe I’d benefit from a more detailed/ complicated example.

thanks for your thoughts.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-18T11:41:30+00:00

Editorial Team

2026-05-18T11:41:30+00:00Added an answer on May 18, 2026 at 11:41 am

For full browser interaction you are best to look at using Selenium-RC

This has a python driver and you can script a browser to “test” just about any site on the internet

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m still a newcomer to python, so I hope this question isn’t inane. The

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply