I know that Python dicts will “leak” when items are removed (because the item’s slot will be overwritten with the magic “removed” value)… But will the set class behave the same way? Is it safe to keep a set around, adding and removing stuff from it over time?
Edit: Alright, I’ve tried it out, and here’s what I found:
>>> import gc >>> gc.collect() 0 >>> nums = range(1000000) >>> gc.collect() 0 ### rsize: 20 megs ### A baseline measurement >>> s = set(nums) >>> gc.collect() 0 ### rsize: 36 megs >>> for n in nums: s.remove(n) >>> gc.collect() 0 ### rsize: 36 megs ### Memory usage doesn't drop after removing every item from the set… >>> s = None >>> gc.collect() 0 ### rsize: 20 megs ### … but nulling the reference to the set *does* free the memory. >>> s = set(nums) >>> for n in nums: s.remove(n) >>> for n in nums: s.add(n) >>> gc.collect() 0 ### rsize: 36 megs ### Removing then re-adding keys uses a constant amount of memory… >>> for n in nums: s.remove(n) >>> for n in nums: s.add(n+1000000) >>> gc.collect() 0 ### rsize: 47 megs ### … but adding new keys uses more memory.
Yes,
setis basically a hash table just likedict— the differences at the interface don’t imply many differences “below” it. Once in a while, you should copy the set —myset = set(myset)— just like you should for a dict on which many additions and removals are regularly made over time.