One really newbie question.
I’m working on a small python script for my home use, that will collect data of a specific air ticket.
I want to extract the data from skyscanner (using BeautifulSoap and urllib). Example:
And I’m interested in all the data that are stored in this kind of element, specially the price: http://shrani.si/f/1w/An/1caIzEzT/capture.png
Because they are not located in the HTML, can I extract them?
I believe the problem is that these values are rendered through a javascript code which your browser runs and
urllibdoesn’t – You should use a library that can execute javascript code.I just googled
crawler python javascriptand I got the some stackoverflow questions and answers which recommends the use of selenium or webkit. You can use those libraries through scrapy. Here are two snippets:Rendered/interactive javascript with gtk/webkit/jswebkit
Rendered Javascript Crawler With Scrapy and Selenium RC