While reading the documentation on Python re module I decided to have a look on re.py source code.
When I opened it, I found this:
_cache = {}
_MAXCACHE = 100
def _compile(*key):
cachekey = (type(key[0]),) + key
p = _cache.get(cachekey)
if p is not None:
return p
#...Here I skip some part of irrelevant to the question code...
if len(_cache) >= _MAXCACHE:
_cache.clear()
_cache[cachekey] = p
return p
Why is the cache cleared using_cache.clear() when it reaches _MAXCACHE of entries?
Is it common approach to clear cache completely and start from scratch?
Why just not used the longest time ago cashed value is deleted?
If I had to guess I’d say that it was done this way to avoid having to keep track of when / how long individual values had been stored in the cache, which would create both memory and processing overhead. Because the caching object being used is a dictionary, which is inherently unordered, there’s no good way to know what order items were added to it without some other caching object as well. This could be addressed by using an OrderedDict in place of a standard dictionary, assuming you’re working with Python >= 2.7, but otherwise, you’d need to significantly redesign the way the caching was implemented in order to eliminate the need for a
clear().