I’m using HtmlUnit to programatically fill out and submit a web form in Java. Here’s my code:
WebClient client = new WebClient();
client.setThrowExceptionOnScriptError(false); // avoid JavaScript errors
client.setTimeout(120000); // 2 minutes
HtmlPage page;
// load the website
page = client.getPage("http://www.some-website.com");
// represent the page elements in Java objects
// input fields and checkboxes first, then...
HtmlSubmitInput submit = form.getInputByName("submitbutton");
// set "value" attributes of input fields and checkboxes...
// submit the page
System.out.println("Submitting... ");
page = submit.click();
System.out.println("Done!");
// return the resulting HTML for scraping
return page.asXml();
Now, in the submit.click() part, I keep getting the following Exception:
java.net.SocketTimeoutException: Timeout while fetching: http://www.some-website.com
I understand that’s because I’m trying to retrieve data from all the way back to 2002 and until today. Loading it from my browser, the whole process takes normally around six minutes or so, and about 24,200 rows of data are returned.
I counted the time from when Submitting... is printed out and until the SocketTimeoutException is thrown, and in all cases, it’s always exactly one minute even though I set the client timeout to two minutes. Now, I know that that’s the timeout for initially loading the page (the client.getPage(...) call), so is there any way for me to set the timeout for the button click and make it wait more than one minute, maybe ten?
As of this writing, there is no known solution to this problem. What I ended up doing i make multiple automated requests on the page to receive the data in parts. Basically, I queried for 2002 first, then 2003, 2004, and so on.