I am trying to build a simple webscraper using Request and Cheerio.
The goal right now is to scrape the destination page (in this case http://bukk.it), grab the text from the target selectors on the page, and push it to an array that I can use in other functions.
I understand that request() is executing asynchronously, but do not know how to the scraped data visible outside the function.
example.js
// dependencies
var request = require('request')
, cheerio = require('cheerio');
// variables
var url = 'http://bukk.it/'; // url to scrape
var bukkits = []; // hold our scraped data
request(url, function(err, resp, body){
if (err) {
return
}
$ = cheerio.load(body);
// for each of our targets (within the request body)...
$('td a').each(function(){
content = $(this).text();
// I would love to populate the bukkits array for use later...
bukkits.push(content);
})
console.log(bukkits.length); // everything is working inside request
});
console.log(bukkits.length); // nothing, because request is asynchronous?
// that's cool but... how do I actually get the data from the request into bukkits[] ?
Essentially, your entire program must now take place inside the callback. No code after that callback will ever have access to data that was retrieved asynchronously and passed to the callback.
This isn’t as bad as it sounds. You can use named functions, like so: