I need to scrape a webpage with javascript and looks like this is resolved in the example at http://snipplr.com/view/66996/renderedinteractive-javascript-with-gtkwebkitjswebkit/
referred in question Extracting data from Web code uses webkit downloader class. I understand that I need to invoke process_request function. What do I pass in as parameter in request. I looked through scrapy documentation to see if I have to pass request object created in scrapy but that does not work.
Also, I understand that the spider object is to be passed in process_request as the last parameter. Which object should that be? Sorry I am new to python, scrapy and webkit so maybe asking questins with obvious answers.
You don’t “invoke”
process_requestmanually, you only have to declare it, the engine will invoke it with all the right parameters. Just create a file calledmiddleware.py(or whatever you want to call it) and type in:and all the rest of it, and then in your settings.py file type in:
That should get your middleware working.