As part of a small project i’m doing that will never actually go into ‘production’ I need to be able to log into a very secure website and retrieve html.
I’ve been looking into doing this using the apache commons HTTPCLient. However i just wanted to make sure it was even possible as this website is very secure and likely has sso methods to sign in?
If it is possible, what is the best way to do this? I need to be able to navigate through about three pages once i have logged in so will need to store the cookie or session somehow.
Thanks very much!
yes, its possible to do this using apache http components, but for interacting with complex websites nothing (that i know) beats HtmlUnit. to work with httpcomponents you’d need to “script” the whole sequence of http requests, and you’ll have a problem if anything in the middle relies on dynamic content/javascript.
HtmlUnit, on the other hand, is an almost complete “bowser in a box” and you can script the interaction at a much higher level – click this, fill those values, submit etc.