I have a web scraping script that gets new data once every minute, but

Question

0

Asked: May 13, 20262026-05-13T23:24:52+00:00 2026-05-13T23:24:52+00:00

I have a web scraping script that gets new data once every minute, but

0

I have a web scraping script that gets new data once every minute, but over the course of a couple of days, the script ends up using 200mb or more of memory, and I found out it’s because mechanize is keeping an infinite browser history for the .back() function to use.

I have looked in the docstrings, and I found the clear_history() function of the browser class, and I invoke that each time I refresh, but I still get 2-3mb higher memory usage on each page refresh. edit: Hmm, seems as if it kept doing the same thing after I called clear_history, up until I got to about 30mb worth of memory usage, then it cleared back down to 10mb or so (which is the base amount of memory my program starts up with)…any way to force this behavior on a more regular basis?

How do I keep mechanize from storing all of this info? I don’t need to keep any of it. I’d like to keep my python script below 15mb memory usage.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T23:24:53+00:00

You can pass an argument history=whatever when you instantiate the Browser; the default value is None which means the browser actually instantiates the History class (to allow back and reload). The simplest approach (will give an attribute error exception if you ever do call back or reload):

class NoHistory(object):
  def add(self, *a, **k): pass
  def clear(self): pass

b = mechanize.Browser(history=NoHistory())

a cleaner approach would implement other methods in NoHistory to give clearer exceptions on erroneous use of the browser’s back or reload, but this simple one should suffice otherwise.

Note that this is an elegant (though not well documented;-) use of the dependency injection design pattern: in a (bleah) “monkeypatching” world, the client code would be expected to overwrite b._history after the browser is instantiated, but with dependency injection you just pass in the “history” object you want to use. I’ve often maintained that Dependency Injection may be the most important DP that wasn’t in the “gang of 4” book!-).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a web scraping script that gets new data once every minute, but

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply