I am currently in the process of developing two iOS applications which heavily rely on MySQL databases. They each have their own API which is requested by the respective application, which runs relevant queries requesting data from the MySQL databases.
The queries vary from being simple, user or ‘object’ based:
SELECT `username`, `id`, `full_name` FROM `users` WHERE `id` = 1
INSERT INTO `users` (`full_name`, `username`, `email`, `password`, `signup_method`, `latitude`, `longitude`) VALUES (?, ?, ?, ?, ?, ?, ?)"
SELECT q.*, (SELECT COUNT(a.qid) FROM answers as a WHERE qid=q.id) AS a_count FROM questions as q ORDER BY a_count DESC LIMIT 1, 10
to location based:
SELECT ( 6371 * acos( cos( radians(?) ) * cos( radians( latitude ) ) * cos( radians( longitude ) - radians(?) ) + sin( radians(?) ) * sin( radians( latitude ) ) ) ) AS distance FROM `users` HAVING distance <= 5 ORDER BY points DESC
SELECT * , (6371 * acos(cos(radians(latitude)) * cos(radians({$values['latitude']})) * cos(radians({$values['longitude']}) - radians(longitude)) + sin(radians(latitude)) * sin(radians({$values['latitude']})))) AS distance FROM `questions` HAVING distance <= ? ORDER by distance LIMIT ?,?
These queries obviously take time. Especially the latter due to the performance intensity it causes.
Many services use caching layers alongside their databases to improve performance. E.g:
- Memcachd
- Redis
- and more.
My question is when, in regards to queries, should caching be used, and what are the benefits of using caching?
Thanks,
Max!
You should cache simply when it’s cheaper to cache than it is to generate the results from scratch.
This cost depends on things like:
But always, start at the source. Have you examined MySQL’s slow-query-log, to see which queries are costly? It can help you see where you’re missing important indices, and which queries take unexpectedly long.
[pt-query-digest]1 from the Percona-Toolkit can help with by summarizing this logfile. Optimize your databases before you start caching.Looking at your types of queries, it seems to me that caching the results and even pre-heating the cache is well worth it.
The choice of cache is an important one of course. I assume you’re already using MySQL’s built-in query-cache? Make sure it’s enabled and that it has enough memory assigned to it. Simple queries like the ‘SELECT username’ one are cheap anyway, but are also easily cached by MySQL itself. There are a lot of limits to built-in query-caching though, and a lot of reasons that queries are not cached or caches are flushed. For example, queries with functions (like your location-based queries) are simply skipped. Read the docs.
Using a cache like Redis allows for far more control over what to cache, for how long, and how to expire it. There are many ideas on how to implement this and they depend on your application as well. Have a look around the net.
I’d suggest enabling the query-cache, simply because it’s easy and cheap and will help a bit, and I’d definitely look at implementing an in-memory caching layer for you database. Maybe an indexing server, like Solr, which has built-in methods for location-bases queries, is worth considering. We use it together with MySQL.
Memcached and Redis are good choices for caching. I’d personally pick Redis because it has more use-cases and optional persistance to disk, but that’s entirely up to you. Maybe your framework-of-choice has some existing components that you can use in your application.
Another tip: measure everything. You only know what to optimize or cache if you know what takes time. Also, the results of your optimizations will only be clear if you measure again. Implement something like statsd and measure the various events and timings in your application. Better too much than not enough. Graph the results and analyze them over time. You’ll be surprised what turns up.