I am making anywhere between 1 to 10 web requests using jsdom (web-scraping library for Node.js). It goes something like this:
app.get('/results', function(req, res) {
jsdom.env(
"http://website1.com",
["http://code.jquery.com/jquery.js"],
function (errors, window) {
// scrape website #1
}
);
jsdom.env(
"http://website2.com",
["http://code.jquery.com/jquery.js"],
function (errors, window) {
// scrape website #2
}
);
jsdom.env(
"http://website3.com",
["http://code.jquery.com/jquery.js"],
function (errors, window) {
// scrape website #3
}
);
}
res.render('results', { items: items });
}
How do I run res.render() ONLY after all jsdom requests have been completed and after I have gathered all the information that I need? In a synchronous world this obviously would not be a problem, but since javascript is asynchronous, res.render() will be run before any of jsdom callbacks are finished.
Naive solution
The “naive” solution you could employ for a small number of scrapes is to nest everything (start each scrape in the callback of the last scrape, the last callback contains the render method.)
That becomes tedious and illegible, of course. (And everything runs in series, not parallel, which won’t be very fast.)
Better solution
The better solution would be to write a function to count the number of returned results and calls
renderwhen all have returned. Here is one implementation:To use it in your example:
The best solution
Instead of writing your own, you should really learn to use an existing parallel / async library as suggested by @almypal in the comment to your question.
With
asyncyou could do something much neater as described in the docs: https://github.com/caolan/async#parallelOr if all your scrapes actually look for the same elements in the resulting pages, you could even do a parallel map over an array of URLs to scrape: https://github.com/caolan/async#maparr-iterator-callback
Each of your scrapes can use the callback function provided by async’s parallel method, to return the results of its scrape. The final [optional] callback will contain your call to
renderwith all the items.EDIT: The example you asked for
This is your code, directly translated to the
asynclibrary: