I have a few desktop machines at different geographical locations. I need to create a crawler with clients at each desktop machine and a central server where data is indexed. Is it possible to create such a crawler in Nutch? Are there any alternatives. Python based crawlers would be preferable.
I have a few desktop machines at different geographical locations. I need to create
Share
If you use Nutch like buffer suggested, there is a script on the Nutch Wiki that may be able to help you. You would just need to get the linkdb, crawldb, and segments from each system to the central server before doing this – I think attempting to access those resources remotely would take a long time during the indexing process.