I need to capture a web site and am looking for an appropriate library or program to do this. The website uses Java Script and pushes updates to the page and I need to capture these as well as the page itself. I am using curl to capture the page itself but I don’t know how to capture the updates. Where given a choice I would use C++.
Regards
Install Firefox and GreaseMonkey. Have the GM script add DOM events where appropriate to track modifications. You can then use XMLHttpRequest to send the information to a server, or write them to local files with XPCOM file IO opearation.
With this, you can do what you want in a dozen lines and little to no reverse engineering, whereas what others have advised (screen scraping) will require thousands of lines of code for a JavaScript heavy site IMO.
Addenda: this is /not/ a job for C++. And should you do it in C++ anyway, you will end up havin to reverse engineer JS, so you might as well just learn enough JS to use GreaseMonkey in the first place.