I have a simple web service which serves XML datasets (they can be as big as 250MB). This data comes from complex queries executed against a database. To speed up the service, I would like to cache the result of some of the queries. However I have a limited amount of RAM (~2GB). I don’t know in advance what the most requested XML dataset is. In addition, this can change over time (e.g. yesterday the dataset X is the most often requested, tomorrow it can be dataset Y).
I would like an “intelligent” cache algorithm which would cache the datasets which are the most likely to be requested. In this case, I cannot simply go with counters and cache the most often requested piece of data. I need some sort of time decay of the number of requests.
You can use LRU. Every time you access something not in the cache, replace the the thing is the cache used longest ago, and set its age to 0, incrementing all other ages. Every time you have a cache hit, reset the element’s age and increment all others. Can also be done by setting equal to current timestamp.
Note: LRU is often used as an approximation of the optimal algorithm which requires oracular knowledge: replace the one which will be not be used for the longest time. LRU works well when locality is good, and does not suffer from Belady’s anomaly.