I have the following code:
String website = "http://www.somewebsite.com/";
Document doc = Jsoup.connect(website).get();
Elements aElements = doc.select("a");
for (Element element : aElements)
{
System.out.println(element.attr("href"));
}
When I see the output of the href content it looks like the following:
?nats=MzQ2NDAwLjQuNDYuNDYuMS43MDAxOTQ4LjAuMA&img=1
?nats=MzQ2NDAwLjQuNDYuNDYuMS43MDAxOTQ4LjAuMA&img=2
?nats=MzQ2NDAwLjQuNDYuNDYuMS43MDAxOTQ4LjAuMA&img=3
?nats=MzQ2NDAwLjQuNDYuNDYuMS43MDAxOTQ4LjAuMA&img=4
When I go to the webpage with my browser (Firefox) the href content looks like the following:
…/../../picture1.jpg
…/../../picture2.jpg
…/../../picture3.jpg
…/../../picture4.jpg
I’ve tried changing the “Referer” variable to the websites name with the following code:
Document doc = Jsoup.connect(website).header("Referer", "http://www.somewebsite.com/").get();
But that doesn’t work..
How is it possible for the website to somehow “hide” the href content from my Jsoup “downloader” but show it when I’m actually browsing with my real browser?
How can I get around it?
Solved the problem by setting the argument of the userAgent method: