I am trying to parse an html page but I need to filter the results before I parse the page.
For instance, ‘http://www.ksl.com/index.php?nid=443’ is a classified listing of cars in Utah. Instead of parsing ALL the cars, I’d like to filter it first (ie find all BMWs) and then only parse those pages. Is it possible to fill in a javascript form with python?
Here’s what I have so far:
import urllib
content = urllib.urlopen('http://www.ksl.com/index.php?nid=443').read()
f = open('/var/www/bmw.html',"w")
f.write(content)
f.close()
Here is the way to do it. First download the page, scrape it to find the models that you are looking for, then you can get links to the new pages to scrape. There is no need for javascript here. This model and the BeautifulSoup documentation will get you going.
At the moment of answering, this code snippet is looking for Honda models and returns the following: