I’m trying to build something that crawls the content from a page with infinite scroll. However, I can’t get the stuff from below the first ‘break’. How do I do this?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Infinite scrolling is almost always done in JavaScript by using AJAX, or related technology. As such, it is not enough for your web crawler to get the HTML and parse it; it must download and execute the javascript, or at least scan it for the AJAX calls.
Doing a full javascript execution is probably best (ie, will be most guaranteed to work), but is probably the hardest to do.
Scanning the javascript for AJAX requests and/or looking for functions that execute AJAX calls and then do DOM manipulation will probably be easiest (relative to full JS execution)