I search for a web crawler solution which can is mature enough and can

Question

0

Asked: May 13, 20262026-05-13T11:36:43+00:00 2026-05-13T11:36:43+00:00

I search for a web crawler solution which can is mature enough and can

0

I search for a web crawler solution which can is mature enough and can be simply extended. I am interested in the following features… or possibility to extend the crawler to meet them:

partly just to read the feeds of several sites
to scrape the content of these sites
if the site has an archive I would like to crawl and index it as well
the crawler should be capable to explore part of the Web for me and it should be able to decide which sites matches the given criteria
should be able to notify me, if things possibly matching my interest were found
the crawler should not kill the servers by attacking it by too many requests, it should be smart doing crawling
the crawler should be robust against freak sites and servers

Those things above can be done one by one without any big effort, but I am interested in any solution which provide a customisable, extendible crawler. I heard of Apache Nutch, but very unsure about the project so far. Do you have experiences with it? Can you recommend alternatives?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T11:36:44+00:00

Editorial Team

2026-05-13T11:36:44+00:00Added an answer on May 13, 2026 at 11:36 am

A quick search on GitHub threw up Anemone, a web spider framework which seems to fit your requirements – particularly extensiblility. Written in Ruby.
Hope it goes well!

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I search for a web crawler solution which can is mature enough and can

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply