I’ve seen several database cache engines, all of them are pretty dumb (i.e.: keep this query cached for X minutes) and require that you manually delete the whole cache repository after a INSERT / UPDATE / DELETE query has been executed.
About 2 or 3 years ago I developed an alternative DB cache system for a project I was working on, the idea was basically to use regular expressions to find the table(s) involved in a particular SQL query:
$query_patterns = array
(
'INSERT' => '/INTO\s+(\w+)\s+/i',
'SELECT' => '/FROM\s+((?:[\w]|,\s*)+)(?:\s+(?:[LEFT|RIGHT|OUTER|INNER|NATURAL|CROSS]\s*)*JOIN\s+((?:[\w]|,\s*)+)\s*)*/i',
'UPDATE' => '/UPDATE\s+(\w+)\s+SET/i',
'DELETE' => '/FROM\s+((?:[\w]|,\s*)+)/i',
'REPLACE' => '/INTO\s+(\w+)\s+/i',
'TRUNCATE' => '/TRUNCATE\s+(\w+)/i',
'LOAD' => '/INTO\s+TABLE\s+(\w+)/i',
);
I know that these regexs probably have some flaws (my regex skills were pretty green back then) and obviously don’t match nested queries, but since I never use them that isn’t a problem for me.
Anyway, after finding the involved tables I would alphabetically sort them and create a new folder in the cache repository with the following naming convention:
+table_a+table_b+table_c+table_...+
In case of a SELECT query, I would fetch the results from the database, serialize() them and store them in the appropriate cache folder, so for instance the results of the following query:
SELECT `table_a`.`title`, `table_b`.`description` FROM `table_a`, `table_b` WHERE `table_a`.`id` <= 10 ORDER BY `table_a`.`id` ASC;
Would be stored in:
/cache/+table_a+table_b+/079138e64d88039ab9cb2eab3b6bdb7b.md5
The MD5 being the query itself. Upon a consequent SELECT query the results would be trivial to fetch.
In case of any other type of write query (INSERT, REPLACE, UPDATE, DELETE and so on) I would glob() all the folders that had +matched_table(s)+ in their name all delete all the file contents. This way it wouldn’t be necessary to delete the whole cache, just the cache used by the affected and related tables.
The system worked pretty well and the difference of performance was visible – although the project had many more read queries than write queries. Since then I started using transactions, FK CASCADE UPDATES / DELETES and never had the time to perfect the system to make it work with these features.
I’ve used MySQL Query Cache in the past but I must say the performance doesn’t even compare.
I’m wondering: am I the only one who sees beauty in this system? Is there any bottlenecks I may not be aware of? Why do popular frameworks like CodeIgniter and Kohana (I’m not aware of Zend Framework) have such rudimentary DB cache systems?
More importantly, do you see this as a feature worth pursuing? If yes, is there anything I could do / use to make it even faster (my main concerns are disk I/O and (de)serialization of query results)?
I appreciate all input, thanks.
I can see the beauty in this solution, however, I belive it only works for a very specific set of applications. Scenarios where it is not applicable include:
Databases which utilize cascading deletes/updates or any kind of triggers. E.g., your DELETE to table A may cause a DELETE from table B. The regex will never catch this.
Accessing the database from points which do not go through you cache invalidation scheme, e.g. crontab scripts etc. If you ever decide to implement replication across machines (introduce read-only slaves), it may also disturb the cache (because it does not go through cache invalidation etc.)
Even if these scenarios are not realistic for your case it does still answer the question of why frameworks do not implement this kind of cache.
Regarding if this is worth pursuing, it all depends on your application. Maybe you care to supply more information?