I’ve a python code where the memory consumption steadily grows with time. While there are several objects which can legitimately grow quite large, I’m trying to understand whether the memory footprint I’m observing is due to these objects, or is it just me littering the memory with temporaries which don’t get properly disposed of — Being a recent convert from a world of manual memory management, I guess I just don’t exactly understand some very basic aspects of how the python runtime deals with temporary objects.
Consider a code with roughly this general structure (am omitting irrelevant details):
def tweak_list(lst):
new_lst = copy.deepcopy(lst)
if numpy.random.rand() > 0.5:
new_lst[0] += 1 # in real code, the operation is a little more sensible :-)
return new_lst
else:
return lst
lst = [1, 2, 3]
cache = {}
# main loop
for step in xrange(some_large_number):
lst = tweak_list(lst) # <<-----(1)
# do something with lst here, cut out for clarity
cache[tuple(lst)] = 42 # <<-----(2)
if step%chunk_size == 0:
# dump the cache dict to a DB, free the memory (?)
cache = {} # <<-----(3)
Questions:
- What is the lifetime of a
new_listcreated in atweak_list? Will it be destroyed on exit, or will it be garbage collected (at which point?). Will repeated calls totweak_listgenerate a gazillion of small lists lingering around for a long time? - Is there a temporary creation when converting a
listto atupleto be used as adictkey? - Will setting a
dictto an empty one release the memory? - Or, am I approaching the issue at hand from a completely wrong perspective?
new_lstis cleaned up when the function exists when not returned. It’s reference count drops to 0, and it can be garbage collected. On current cpython implementations that happens immediately.If it is returned, the value referenced by
new_lstreplaceslst; the list referred to bylstsees it’s reference count drop by 1, but the value originally referred to bynew_lstis still being referred to by another variable.The
tuple()key is a value stored in thedict, so that’s not a temporary. No extra objects are created other than that tuple.Replacing the old
cachedict with a new one will reduce the reference count by one. Ifcachewas the only reference to the dict it’ll be garbage collected. This then causes the reference count for all contained tuple keys to drop by one. If nothing else references to those those will be garbage collected.Note that when Python frees memory, that does not necessarily mean the operating system reclaims it immediately. Most operating systems will only reclaim the memory when it is needed for something else, instead presuming the program might need some or all of that memory again soon.