I am working on a financial database that I need to develop caching for. I have a MySQL database with a lot of raw, realtime data. This data is then provided over a HTTP API using Flask (Python).
Before the raw data is returned it is manipulated by my python code. This manipulation can involve a lot of data, therefore a caching system is in order.
-
The cached data never changes. For example, if someone queries for data for a time range of 2000-01-01 till now, the data will get manipulated, returned and stored in the cache as being the specifically manipulated data from 2000-01-01 till now. If the same manipulated data is queried again later, the cache will retrieve the values from 2000-01-01 till the last time it was queried, elimination the need for manipulation for that entire period. Then, it will manipulate the new data from that point till now, and add that to the cache too.
-
The data size shouldn’t be enormous (under 5GB I would say at max).
-
I need to be able to retrieve from the cache using date ranges.
Which DB should I be looking it? MongoDB? Redis? CouchDB?
Thanks!
Using BigData solution for such a small data set seems like a waste and might still not yell the required latency.
It seems like what you need is not one of the BigData solution like MongoDB or CouchDB but a distributed Caching (or In Memory Data Grid).
One of the leading solution which (which I’m one of its contributors) seems like a perfect match for you needs is XAP Elastic Caching.
For more details see: http://www.gigaspaces.com/datagrid
And you can find a post describing exactly this case on how you can use DataGrid to scale MySQL: “Scaling MySQL” – http://www.gigaspaces.com/mysql