I have a dictionary object with about 60,000 keys that I cache and access in my Django view. The view provides basic search functionality where I look for a search term in the dictionary like so:
projects_map = cache.get('projects_map')
projects_map.get('search term')
However, just grabbing the cached object (in line 1) causes a a giant spike in memory usage on the server – upwards of 100MBs sometimes – and the memory isn’t released even after the values are returned and the template rendered.
How can I keep the memory from jacking up like this? Also, I’ve tried explicitly deleting the object after I grab the value but even that doesn’t release the memory spike.
Any help is greatly appreciated.
Update: Solution I ultimately implemented
I decided to implement my own indexing table in which I store the keys and their pickled value. Now, instead of using get() on a dictionary, I use:
ProjectsIndex.objects.get(index_key=<search term>)
and unpickle the value. This seems to take care of the memory issue as I’m no longer loading a giant object into memory. It adds another small query to the page but that’s about it. Seems to be the perfect solution…for now.
..what about using some appropriate service for caching, such as redis or memcached instead of loading the huge object in memory python-side? This way, you’ll even have the ability to scale on extra machines, should the dictionary grow more..
Anyways, the 100MB memory contain all the data + hash index + misc. overhead; I noticed myself the other day that many times memory doesn’t get deallocated until you quit the Python process (I filled up couple gigs of memory from the Python interpreter, loading a huge json object.. :)); it would be interesting if anybody has a solution for that..
Update: caching with very few memory
Your options with only 512MB ram are:
and, in the latter two cases, try splitting up your objects, so that you never retrieve megabytes of objects from the cache at once.
Update: lazy dict spanning over multiple cache keys
You can replace your cached dict with something like this; this way, you can continue treating it as you would with a normal dictionary, but data will be loaded from cache only when you really need it.
And then replace this:
with: