I needed a headless browser to parse pages.
HtmlUnit allow me to setup a Heroku Java app to fullfil this purpose.
But now I’m meeting with couple of issues.
The current one is malformed url “//path” instead of “/path” or “http(s)://path”.
I downloaded sources of the 2.9.4 version and pushed tiny fixes in the sources …
It’s not really efficient to modify standard sources for obvious maintainability reasons.
I’m so wondering if i’m not digging in the wrong direction.
HtmlUnit is designed to browse pages in a testing purpose. Mine is to do like a browser, so make page working the most possible, especially because my damned targeted websites are the kind of ultra-dirty-not-respecting-anything…
What is your opinion about this retrospection ?
HTML Unit is used in Selenium 2/Web Driver for headless browser “simulation”. There it works fine.
So I see no reason not to try Html Unit. May you can have a look at Selenium 2/Web Driver too.