I have a Java Webserivce which querying a DB to return data to users. DB queries are expensive so I have Cron job which runs every 60 seconds to cache the current data in memcached.
Data elements ‘close’ after a time meaning they aren’t returned by “get current data” requests. So these requests can utilize the cached data.
Clients use a feature called ‘since’ to get all the data that has changed since a particular timestamp (the last request’s timestamp). This would return any closed data if that data closed during since that timestamp.
How can I effectively store the diffs/since data? Accessing the DB for every since request is too slow (and won’t scale well), but because clients could request any since time, it makes it difficult to generate an all-purpose cache.
I tried having the cron job also build a since cache. It would do ‘since’ requests to have everything that changed since the last update, and attempted to force clients to request the timestamps which matched the cron job’s since requests. But inconsistencies in how long the cron took plus neither the client nor corn job runs exactly every 60 seconds, so the small differences add up. This eventually results in some data closing, but the cache or the client misses it.
I’m not even sure what to search for to solve this.
I’d be tempted to stick a time expiring cache (eg ehcache with timeToLive set) in front of the database and have whatever process updated the database also put the data directly into the cache (resetting or removing an existing matching element). The webservice then just hits the cache (which is incredibly fast) on everything except its initial connection, filtering out the few elements that are too old and sending the rest on to the client. Gradually the old data gets dropped from the cache as its time to live passes. Then just make sure the cache gets pre populated when the service starts up.