Hi i’ve wrote a query that works:
SELECT `comments`.* FROM `comments`
RIGHT JOIN (SELECT MAX( id ) AS id, core_id, topic_id
FROM comments GROUP BY core_id, topic_id order by id desc) comm
ON comm.id = comments.id LIMIT 10
I want know if it is possible (and how) to rewrite it to get better performance.
Thanks
Method 1 – improving the original query
I am pretty sure that in this case is an
INNER JOINwill suffice, there is no reason to do aRIGHT JOIN(if anidexists incommit will also exists incomments).INNER JOINs can result in better performance.Moreover, you really want to push the
LIMIT 10insidecomm(incidentally, keeping it together with theORDER BY):LIMIT 10andORDER BYtogether will not get you the ten most recently posted-to topics (the ordering of thecommsubquery will not necessarily be preserved into the final result which you areLIMITing.)LIMITinside the innermost, aggregate subquery will encourage cost-based optimizers to favour nested loops (10, to be exact) over hash or merge joins (the 10 nested loops being by far the fastest for any respectably-sizedcommentstable.)So, your query should be rewritten as:
Finally, use
EXPLAINto see what the query is doing. Do not forget to check that you have created an index oncomments.idto help with theJOINnested loops.Method 2 – a different approach
Note that while the above query could still be faster than your original query, the innermost
commsubquery may still turn out to be a significant bottleneck if it results in a full table scan ofcomments. This really depends on how smart the database is when it seesGROUP BY,ORDER BYandLIMITtogether.If
EXPLAINindicates that the subquery is doing a table scan, then you can try a combination of SQL and application-level logic to get the best performance assuming that I have understood your requirement correctly and that you want to identify the ten most recent comments posted in ten different topics:In most cases (that is, provided that your application’s database driver does not attempt to buffer all rows into memory for you before giving you back control from
execute), the above will only fetch a handful of rows, always using an index, with no table scans involved.Method 3 – a hybrid approach
If ten seconds ago I knew what the ten most recent comments were, can I be smart about it when I ask the question again later on? Unless comments can be deleted from the database, then the answer is yes, because I know that, when I ask the question again, all comment IDs will be greatly than or equal to the oldest comment ID I got in my last query.
I can therefore rewrite the innermost query to be much, much more selective using an additional condition,
WHERE id >= :last_lowest_id:When you run the query for the very first time, use
0for:last_lowest_id. The query will return up to 10 rows, in descending order. Inside your application, put aside theidof the last row, and reuse its value as:last_lowest_idthe next time you run the query, and repeat (again, put aside theidof the last row returned by the latest query etc.) This will essentially make the query incremental, and extremely fast.Example:
:last_lowest_idset to0129, 100, 99, 88, 83, 79, 78, 75, 73, 7070:last_lowest_idset to70130, 129, 100, 99, 88, 83, 79, 78, 75, 7373Method 4 – yet another approach
If you expect to perform
SELECT ... ORDER BY id DESC LIMIT 10much more often thanINSERTs into thecommentstable, consider putting a bit more work into theINSERTto make theSELECTfaster. Thus, you can add an indexedupdated_atcolumn to yourtopicsetc. table, and whenever youINSERTa comment into thecommentstable consider also updating the corresponding topic’supdated_atvalue toNOW(). You can then easily select the 10 most recently updated topics (a simple and short index scan onupdated_atreturning 10 rows), inner joining with thecommentstable to get theMAX(id)for those 10 topics (infinitely more efficient than gettingMAX(id)for all topics before picking the ten greatest, like in the original and Method 1), then inner joining again oncommentsto get the rest of thecolumn values for those 10.I expect overall performance of Method 4 to be comparable with Methods 2 and 3. Method 4 will have to be used if you need to get arbitrary topics (e.g. by paginating them,
LIMIT 10 OFFSET 50) or if topics or comments can be removed (no changes necessary to support topic removal; to support comment removal properly then the topic’supdated_atshould be updated on both commentINSERTandDELETEwith thecreated_atvalue of the latest non-deleted comment for the topic.)