I am parsing a web page with the HTML agility pack in vb.net and it works great most of the time, but I have come across a site I need help with.
When I go to grab the web page with my http object (I am using chilkat http and it does not have a javascript engine) I get back the page which is poorly written with document.writes for basically the entire page.
I do not want to use the browser control to first render the page.
Do you know of anything that will allow me to parse this page easily with xpath… does xpath work with javascript? Is there a way for me to remove the javascript with the agility pack?
If no to all of the above, what would you do to get this into a xpath compliant document.
If most of the page gets rendered with javascript, you need to be able to execute the javascript in order to get the end result document.
For this, you need a headless browser such as XBrowser which can execute the javascript. You can feed the resulting document to the HTML Agility Pack.