I use mechanize gem to crawl websites. I wrote a very simple, one-threaded crawler

Question

0

Asked: May 29, 20262026-05-29T06:58:32+00:00 2026-05-29T06:58:32+00:00

I use mechanize gem to crawl websites. I wrote a very simple, one-threaded crawler

0

I use mechanize gem to crawl websites. I wrote a very simple, one-threaded crawler inside a Rails rake task because I needed to access to Rails models.

The crawler runs just fine, but after watching it running for a while I can see that it eats more and more RAM over time, which is bad.

I use God gem to monitor my crawler.

Below is my rake task code, I’m wondering if it exposes any chance of memory leaking?

task :abc => :environment do
  prefix_url = 'http://example.com/abc-'
  postfix_url = '.html'
  from_page_id = (AppConfig.last_crawled_id || 1) + 1
  to_page_id = 100000

  agent = Mechanize.new
  agent.user_agent_alias = 'Mac Safari'

  (from_page_id..to_page_id).each do |i|
    url = "#{prefix_url}#{i}#{postfix_url}"
    puts "#{Time.now} - Crawl #{url}"
    page = agent.get(url)

    page.search('#content > ul').each do |s|
      var = s.css('li')[0].text()
      value = s.css('li')[1].text()
      MyModel.create :var => var, :value => value
    end

    AppConfig.last_crawled_id = i
  end
  # Finish crawling, let's stop
  `god stop crawl_abc`
end

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T06:58:33+00:00

Editorial Team

2026-05-29T06:58:33+00:00Added an answer on May 29, 2026 at 6:58 am

Unless you’ve got the very latest version of mechanize (2.1.1 was released only a day or so ago) by default mechanize operates with an unlimited history size, ie it keeps all the pages you visited and so will gradually use more and more memory.

In your case there isn’t any point to this, so calling max_history= on your agent should limit how much memory is used in this fashion

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I use mechanize gem to crawl websites. I wrote a very simple, one-threaded crawler

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply