I am developing crawler using anemone gem (Ruby- 1.8.7 and Rails 3.1.1). How should

Question

0

Asked: May 27, 20262026-05-27T08:40:10+00:00 2026-05-27T08:40:10+00:00

I am developing crawler using anemone gem (Ruby- 1.8.7 and Rails 3.1.1). How should

0

I am developing crawler using anemone gem (Ruby- 1.8.7 and Rails 3.1.1). How should I skip web-pages with extensions pdf, doc, zip, etc. from crawling/downloading.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T08:40:11+00:00

Editorial Team

2026-05-27T08:40:11+00:00Added an answer on May 27, 2026 at 8:40 am

ext = %w(flv swf png jpg gif asx zip rar tar 7z gz jar js css dtd xsd ico raw mp3 mp4 wav wmv ape aac ac3 wma aiff mpg mpeg avi mov ogg mkv mka asx asf mp2 m1v m3u f4v pdf doc xls ppt pps bin exe rss xml)

Anemone.crawl(url) do |anemone|

    anemone.skip_links_like /\.#{ext.join('|')}$/

    ...

end

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am developing crawler using anemone gem (Ruby- 1.8.7 and Rails 3.1.1). How should

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply