I am currently working on a project which is using Apache HttpClient 4.1.2 and

Question

0

Asked: May 26, 20262026-05-26T23:56:26+00:00 2026-05-26T23:56:26+00:00

I am currently working on a project which is using Apache HttpClient 4.1.2 and

0

I am currently working on a project which is using Apache HttpClient 4.1.2 and it retrieves some data from a website.

What the application does: it goes to a webpage and then goes to the next (found) pages until it reaches the end (e.g.: go to page 1 -> finds 20 more pages -> go to every next 20 pages). The problem is that it gets stuck on retrieving some random pages and it doesn’t continue the crawl.

Here is some code:

DefaultHttpClient mainHttp;
HttpPost post;
HttpResponse response;
HttpEntity entity;
String s;
int curPage = 1;
int index = 0;
boolean ok = true;

...

while (ok) { 
  response = mainHttp.execute(post);
  entity = response.getEntity();
  if (entity != null) {
    System.out.println("Enter " + curPage);
    s = EntityUtils.toString(entity);
    System.out.println("Exit " + curPage);
    index = s.indexOf("[" + curPage + "]");
    if (index > 0) {
      parseContent();
    } else {
      ok = false;
    }                
  }
}

On the debug window is shows something like this:

Enter 1
Exit 1
.
.
.
Enter n

I am also using a http request analyzer and I saw that on the page that stucks, the data is not retrieved completely (it doesn’t reach the </html> or the end of the page).

What can I do to skip or retry downloading the data in such cases? Can anyone help me?

Thank you!

LE:

The actual settings were:

mainHttp.setHttpRequestRetryHandler(new DefaultHttpRequestRetryHandler(1, true));
mainHttp.getParams().setParameter("http.connection-manager.timeout", 15000);
mainHttp.getParams().setParameter("http.socket.timeout", 15000);
mainHttp.getParams().setParameter("http.connection.timeout", 15000);

where 15000 is the timeout in miliseconds.

Thank you for your help.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T23:56:27+00:00

Editorial Team

2026-05-26T23:56:27+00:00Added an answer on May 26, 2026 at 11:56 pm

DefaultMethodRetryHandler retryhandler = new DefaultMethodRetryHandler(1, true);
mainHttp.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, retryhandler);

Source: http://hc.apache.org/httpclient-3.x/tutorial.html (Method recovery)

But this is only if there were any exceptions that occurred, try checking for IOExceptions every time you make a request

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am currently working on a project which is using Apache HttpClient 4.1.2 and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply