I making a web crawler and there are some pages that redirect to other.

Question

0

Asked: May 21, 20262026-05-21T19:18:12+00:00 2026-05-21T19:18:12+00:00

I making a web crawler and there are some pages that redirect to other.

0

I making a web crawler and there are some pages that redirect to other. How I get the page that the original page redirected?

In some sites like xtema.com.br, I can get the url of redirection using the HttpURLConnection class with the getHeaderField(“Location”) method, but in others like visa.com.br, the redirection is made using javascript or another way and this method returns null.

There is some way to always get the page and the url resulting of redirection? The original page without the redirection is not important.

Thanks, and sorry for bad english.

EDIT: Using httpConn.setInstanceFollowRedirects(true) to follow the redirections and returning the URL with httpConn.getURL worked, but I have two issues.

1: The httpConn.getURL only will return the actual url of the redirected page if I call httpConn.getDate before. If I dont this, it will return the original URL before the redirections.

2: Some sites like visa.com.br get the answer 200, but if I open then in the web browser, I see another page.
Eg.: my program – visa.com.br – answer 200 (no redirections)
web broser – visa.com.br/go/principal.aspx – html code different of the version that i get in my program

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T19:18:13+00:00

Use HttpURLConnection, it follows redirects by default.

In case you want to see the redirected URL, you’ll have to do:

httpConn.setInstanceFollowRedirects( false );
httpConn.connect(); 
int responseCode = httpConn.getResponseCode();
while ((responseCode / 100) == 3) { /* codes 3XX are redirections */
   String newLocationHeader = httpConn.getHeaderField( "Location" );
   /* open a new connection and get the content for the URL newLocationHeader */
   /* ... */
   responseCode = httpConn.getResponseCode();
   /* do it until you get some code that is not a redirection */
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I making a web crawler and there are some pages that redirect to other.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply