I’m trying to solve a problem with latency on a to a mysql-5.0 db.
- The query itself is extremely simple:
SELECT SUM(items) FROM tbl WHERE col = 'val' - There’s an index on
coland there are not more than 10000 values to sum in the worst case (mean ofcount(items)for all values ofcolwould be around 10). - The table has up to 2M rows.
- The query is run frequently enough that sometimes the execution time goes up to 10s, although 99% of them take << 1s
- The query is not really cachable – in almost every case, each query like this one will be followed by an insert to that table in the next minute and showing old values is out of question (billing information).
- keys are good enough – ~100% hits
The result I’m looking for is every single query < 1s. Are there any ways to improve the select time without changes to the table? Alternatively, are there any interesting changes that would help to resolve the problem? I thought about simply having a table where the current sum is updated for every col right after every insert – but maybe there are better ways to do it?
Another approach is to add a summary table:
and add some triggers to tbl so that:
on insert:
on delete:
on update:
This will slow down your inserts, but allow you to hit a single row in the summary table for
The biggest problem with this is bootstrapping the values of the summary table. If you can take the application offline, you can easily initialise summary with values from tbl.
However, if you need to keep the service running, it is a lot more difficult. If you have a replica, you can stop replication, build the summary table, install the triggers, restart replication, then failover the service to using the replica, and then repeat the process on the retired primary.
If you cannot do that, then you could update the summary table one value of col at a time to reduce the impact:
Or if you can tolerate a prolonged outage: