I’ve previously written applications, specifically data scrapers, in Node.js. These types of applications had

Question

0

Asked: June 8, 20262026-06-08T01:16:34+00:00 2026-06-08T01:16:34+00:00

I’ve previously written applications, specifically data scrapers, in Node.js. These types of applications had

0

I’ve previously written applications, specifically data scrapers, in Node.js. These types of applications had no web front end, but were merely processes timed with cron jobs to asynchronously make a number of possibly complicated HTTP GET requests to pull web pages, and then scrape and store the data from the results.

A sample of a function I might write would be this:

// Node.js

var request = require("request");

function scrapeEverything() {
    var listOfIds = [23423, 52356, 63462, 34673, 67436];

    for (var i = 0; i < listOfIds.length; i++) {
        request({uri: "http://mydatasite.com/?data_id = " + listOfIds[i]},
                function(err, response, body) {
                     var jsonobj = JSON.parse(body);
                      storeMyData(jsonobj);
                });
    }
}

This function loops through the IDs and makes a bunch of asynchronous GET requests, from which it then stores the data.

I’m now writing a scraper in Python and attempting to do the same thing using Tornado, but everything I see in the documentation refers to Tornado acting as a web server, which is not what I’m looking for. Anyone know how to do this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T01:16:36+00:00

Slightly more involved answer than I thought I would throw together, but it’s a quick demo of how to use Tornado ioloop and AsyncHTTPClient to fetch some data. I’ve actually written a webcrawler in Tornado, so it can be used “headless”.

import tornado.ioloop
import tornado.httpclient

class Fetcher(object):
    def __init__(self, ioloop):
        self.ioloop = ioloop
        self.client = tornado.httpclient.AsyncHTTPClient(io_loop=ioloop)

    def fetch(self, url):
        self.client.fetch(url, self.handle_response)

    @property
    def active(self):
        """True if there are active fetching happening"""

        return len(self.client.active) != 0

    def handle_response(self, response):
        if response.error:
            print "Error:", response.error
        else:
            print "Got %d bytes" % (len(response.body))

        if not self.active:
            self.ioloop.stop()

def main():
    ioloop = tornado.ioloop.IOLoop.instance()
    ioloop.add_callback(scrapeEverything)
    ioloop.start()

def scrapeEverything():
    fetcher = Fetcher(tornado.ioloop.IOLoop.instance())

    listOfIds = [23423, 52356, 63462, 34673, 67436]

    for id in listOfIds:
        fetcher.fetch("http://mydatasite.com/?data_id=%d" % id)

if __name__ == '__main__':
    main()

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve previously written applications, specifically data scrapers, in Node.js. These types of applications had

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply