I apologize in advance if this question is too specific, but I think that it is a fairly typical scenario: join and group bys bogging down the db and the best way to get around it. My specific problem is that I need to create a scoreboard based on:
- plays (userid,gameid,score) 40M rows
- games (gameid) 100K rows
- app_games (appid,gameid) ie, the games are grouped into apps and there’s a total score for the app which is the sum on all its associated games <20 rows
The users can play multiple times and their best score for each game is recorded. Formulating the query is easy, I’ve done several variations but they have a nasty tendency to get locked in “copying temp table” for 30-60 seconds when under load.
What can I do? Are there server variables that I should be tweaking or is there a way to reformulate the query to make it faster? The derived version of the query that I’m using is as follows (minus a user table join to grab the name):
select userID,sum(score) as cumscore from
(select userID, gameID,max(p.score) as score
from play p join app_game ag using (gameID)
where ag.appID = 1 and p.score>0
group by userID,gameID ) app_stats
group by userid order by cumscore desc limit 0,20;
Or as a temp table:
drop table if exists app_stats;
create temporary table app_stats
select userID,gameID,max(p.score) as score
from play p join app_game ag using (gameID)
where ag.appID = 1 and p.score>0
group by userid,gameID;
select userID,sum(score) as cumscore from app_stats group by userid
order by cumscore desc limit 0,20;
I have indexes as follows:
show indexes from play;
+-------+------------+----------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+
| play | 0 | PRIMARY | 1 | playID | A | 38353712 | NULL | NULL | | BTREE | |
| play | 0 | uk_play_uniqueID | 1 | uniqueID | A | 38353712 | NULL | NULL | YES | BTREE | |
| play | 1 | play_score_added | 1 | dateTimeFinished | A | 19176856 | NULL | NULL | YES | BTREE | |
| play | 1 | play_score_added | 2 | score | A | 19176856 | NULL | NULL | | BTREE | |
| play | 1 | fk_playData_game | 1 | gameID | A | 76098 | NULL | NULL | | BTREE | |
| play | 1 | user_hiscore | 1 | userID | A | 650062 | NULL | NULL | YES | BTREE | |
| play | 1 | user_hiscore | 2 | score | A | 2397107 | NULL | NULL | | BTREE | |
+-------+------------+----------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+
I suspect both queries when you create the temp table basically needs to go through all the data in your table (and likewise in your do-everything-at-once query). If you have a lot of data that’s just going to take a little while.
I’d maintain a separate table with the ID and total score for each player. Whenever you update the play table, also update the summary table. If they get out of sync, just stop the summary table and re-create the data from the play table. (Or if you already use redis in your infrastructure, you could maintain the summary there — it has functions to make this particular thing really fast).