I’m still learning about MySQL. I may be making a very basic error, and I’m prepared to be chastened here…
What this query is trying to do is select the top members from our website based on a count of the number of book and recipe reviews they have made.
I’m making a calculation of the total in the SQL query itself. The query is slow (9 seconds) and will definitely not scale considering we only have 400 members and a few thousand reviews so far and it’s growing quite quickly.
I presume it’s doing a full table scan here, and that the calculation is slowing it down, but I don’t know of an alternative way to do this and would love some wisdom.
Here’s the SQL statement:
SELECT users.*, COUNT( DISTINCT bookshelf.ID ) AS titles, COUNT( DISTINCT book_reviews.ID ) as bookreviews, COUNT( DISTINCT recipe_reviews.ID ) AS numreviews, COUNT( DISTINCT book_reviews.ID ) + COUNT( DISTINCT recipe_reviews.ID ) as reviewtotal
FROM users
LEFT OUTER JOIN recipe_reviews ON recipe_reviews.user_id = users.ID
LEFT OUTER JOIN book_reviews ON book_reviews.user_id = users.ID
LEFT OUTER JOIN bookshelf ON users.ID = bookshelf.user_id
GROUP BY users.ID
ORDER BY reviewtotal DESC
LIMIT 8
Here is the EXPLANATION:
+----+-------------+----------------+-------+-------------------+-------------------+---------+---------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------+-------------------+-------------------+---------+---------------------+------+---------------------------------+
| 1 | SIMPLE | users | index | NULL | PRIMARY | 4 | NULL | 414 | Using temporary; Using filesort |
| 1 | SIMPLE | recipe_reviews | ref | recipe_reviews_fk | recipe_reviews_fk | 5 | users.ID | 12 | |
| 1 | SIMPLE | book_reviews | ref | user_id | user_id | 5 | users.ID | 4 | |
| 1 | SIMPLE | bookshelf | ref | recipe_reviews_fk | recipe_reviews_fk | 5 | users.ID | 13 | |
+----+-------------+----------------+-------+-------------------+-------------------+---------+---------------------+------+---------------------------------+
UPDATE & SOLVED:
I realized, and @recursive confirmed, that the query is the root of the problem. I’m getting Cartesian products from this. I rewrote it as a series of subqueries and the final working code is here:
SELECT *, bookreviews + recipereviews AS totalreviews
FROM (SELECT users.*,
(SELECT count(*) FROM bookshelf WHERE bookshelf.user_id = users.ID) as titles,
(SELECT count(*) FROM book_reviews WHERE book_reviews.user_id = users.ID) as bookreviews,
(SELECT count(*) FROM recipe_reviews WHERE recipe_reviews.user_id = users.ID) as recipereviews
FROM users) q
This gives me a result in milliseconds. There are also ways to do this with JOINs. See How to add together the results of several subqueries? if you want to follow this up.
You might try seeing if there is an improvement from removing the
DISTINCTmodifiers. Assuming theDISTINCTed fields are primary keys anyway, this could be causing unnecessary work.