Possible Duplicate:
Make a JavaScript-aware Crawler
I’m trying to figure out what to use as the basis for a PHP based web scraper that can handle pages that render using JavaScript. Many web site scrape attempts (at least the ones I handle) now fail unless the JS in those pages is executed. The pages are not built to gracefully fall back to no-script implementations. This includes those that make heavy use of AJAX.
Would anyone have suggestions for where to start with the development of a web scraper that can handle modern and heavily JavaScript dependent web pages?
Something that can be used by PHP would be best.
It’s possible to use a web browser engine in headless mode to load the page and analyze the DOM. Some googling pointed me at http://phantomjs.org/