I making a web crawler and there are some pages that redirect to other. How I get the page that the original page redirected?
In some sites like xtema.com.br, I can get the url of redirection using the HttpURLConnection class with the getHeaderField(“Location”) method, but in others like visa.com.br, the redirection is made using javascript or another way and this method returns null.
There is some way to always get the page and the url resulting of redirection? The original page without the redirection is not important.
Thanks, and sorry for bad english.
EDIT: Using httpConn.setInstanceFollowRedirects(true) to follow the redirections and returning the URL with httpConn.getURL worked, but I have two issues.
1: The httpConn.getURL only will return the actual url of the redirected page if I call httpConn.getDate before. If I dont this, it will return the original URL before the redirections.
2: Some sites like visa.com.br get the answer 200, but if I open then in the web browser, I see another page.
Eg.: my program – visa.com.br – answer 200 (no redirections)
web broser – visa.com.br/go/principal.aspx – html code different of the version that i get in my program
Use
HttpURLConnection, it follows redirects by default.In case you want to see the redirected URL, you’ll have to do: