I use Jsoup to scrap the website:
doc = Jsoup.connect(String.valueOf(urls[0])).userAgent("Mozilla").get();
Here is the link:
http://www.yelp.com/search?find_desc=restaurant&find_loc=willowbrook%2C+IL&ns=1#l=p:IL:Willowbrook::&sortby=rating&rpp=40
I have added rpp=40 parameter to the link in the command line to display 40 results per page. I’m able to see all the results in page view source.
I know that Jsoup is for the static content only and cannot fetch the websites that use AJAX/JS Libraries technique to generate content. However why Jsoup cannot retrieve the same content as I can see in the browser via page view source? Page view source shows 40 results whereas Jsoup is able to retrieve elements from only 10 results? How can I obtain every elements visible via page view source.
Short answer Jsoup can’t execute the Javascript.
Long answer
The webpage your are looking for accepts the Http Get with the parameters. In the normal browser it accepts the params and loads the page . But Not with willowbrook checked(in your example). It loads the JS after it loads the page and the Javascript does the check box for Fliters the serach results. Therefore when you use Jsoup you are getting more results because it loads ‘state=IL’ without ‘willowbrook’ filtered.