I’m creating a game with points for doing little things, so I have a schema as such:
create table points (
id int,
points int,
reason varchar(10)
)
and to get the number of points a user has is trivial:
select sum(points) as total from points where id = ?
however, performance has become more and more of an issue as the points table expand. I want to do something like:
create table pointtotal (
id int,
totalpoints int
)
what is the best practice for keeping them in sync? Do I try to update pointtotal on every change? Do I run a daily script?
(Assume I have the right keys – they were left out for conciseness)
Edit:
Here are some characteristics that I left out but should be helpful:
Inserts/Updates to Points are not all that frequent
There are a large number of entries, and there are a large number of requests – the keys were pretty trivial, as you can see.
The best practice is to use a normalized database schema. Then the DBMS keeps it up to date, so you don’t have to.
But I understand the tradeoff that makes a denormalized design attractive. In that case, the best practice is to update the total on every change. Investigate triggers. The advantage of this practice is that you can make the total keep in sync with the changes so you never have to think about whether it’s out of date or not. If one change is committed, then the updated total is committed too.
However, this has some weaknesses with respect to concurrent changes. If you need to accommodate concurrent changes to the same totals, and you can tolerate the totals being “eventually consistent,” then use periodic recalculation of the total, so you can be sure only one process at a time is changing the total.
Another good practice is to cache aggregate totals outside the database, e.g. memcached or in application variables, so you don’t have to hit the database every time you need to display the value.
The query “
select sum(points) as total from points where id = ?” should not take 2 seconds, even if you have a huge number of rows and a lot of requests.If you have a covering index defined over
(id, points)then the query can produce the result without reading data from the table at all; it can calculate the total by reading values from the index itself. Use EXPLAIN to analyze your query and look for the “Using index” note in the Extra column.