I’d like to write a script to pre-fetch a list of domain names for my caching dns server. I’m using the top 1,000,000 accessed websites from Alexa, available here:
http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
How can I write a Python script to read this CSV file and perform an “nslookup” (or more efficient way) on each domain name listed, perhaps with a slight delay between each query? Or is there a better way to do this?
I’m guessing it would be most efficient to process the CSV line by line rather than read it all at once to minimize memory usage.
Specifically, I’m looking for a strategy for approaching this problem (libraries, tools, etc …). Sample code is appreciated, but not necessary.
You can totally stick with the python standard modules, as they offer everything you need.
Since
openreturns a iterable file-object (without loading the whole file into memory), you could use code like this:Result:
I would not recommend the
csvhere, since there are always just two values in each line. Use it if you need to handle things like quotecharacters or if you need to write a csv file and their like.While
twistedis also a great module for networking, it would be a little overkill for such a simple task. Just use thesocketmodule.