I am looking for a way to, give a URL, get the source of a webpage back after the JavaScript has been run on it. For example:
I have a webpage with a .
On loading the page, some JavaScript populates the div.
Viewing the source of the page through a browser will not give the information which is within the div.
As far as I know, in order for the browser to render the page the div must have been filled with (X|D)HTML which would mean that the source of the page after being rendered is still just nested markup, so theoretically there should be a “final” version of the page source.
I have considered using a rendering engine like WebKit or Gecko and somehow adapting these to do this, however this is a fairly large task and I don’t really want to duplicate something which has already been done. Does anyone know of a way of performing this task.
Regards.
Update: I am aiming to use Selenium (as mentioned in the comments to the accepted answer) to do this automatically for several pages. My project is a web spider which by design needs to target a number of pages in which the content I am aiming to reach is not available until after the JavaScript has populated everything.
Within Firefox you can get the final rendered DIV by waiting the browser to finish rendering, then pressing ctrl-A to select all content on the page and finally selecting “Show selection source” from the right-click menu.
This shows you the manipulated/populated DOM-code of the page.