In the example below I would like anemone to only execute on the root URL (example.com). I am unsure if I should apply the on_page_like method and if so what pattern I would need.
require 'anemone'
Anemone.crawl("http://www.example.com/") do |anemone|
anemone.on_pages_like(???) do |page|
# some code to execute
end
end
You can also specify the following in the options hash, below are the defaults:
My personal experience is that Anemone was not very fast and had a lot of corner cases. The docs are lacking (as you have experienced) and the author doesn’t seem to be maintaining the project. YMMV. I tried Nutch shortly but didn’t play aroud as much but it seemed faster. No benchmarks, sorry.