I’ve a table of articles, a table of authors, and a table that maps articles to authors.
I’m doing the following query to find out the authors with the most articles:
SELECT a.*, count(*) c
FROM articleAuthors aa
LEFT JOIN authors a ON aa.author_id=a.id
GROUP BY (author_name)
ORDER BY c DESC LIMIT 50
However this query takes a whole minute to complete. The database has about 1,000,000 records in articles_to_authors table.
How could I speed up this GROUP BY query?
Under an assumption of the articleAuthors table having more than 50 distinct authors, I would pre-query just that component and limit to the 50 records you want. Ensure an index exists on (author_id). Also, ensure your authors table has an index on (id). Change your query to
The order by count descending in the prequery will pre-flush AND be pre-ordered by largest count first and stop after 50 records. Then, a simple join to the authors table to get the name and whatever else.
I have the group by based on the author_ID instead of the name as what if you have two authors called “bill board”… The actual ID will be distinct between the two of them.
Now, with the above being a query, you will always be required to query through all million records every time. For something like this, it would PROBABLY be better to add a single “AuthoredItems” column in the authors table. Then, via a trigger on the authorArticles table, when an entry gets added or deleted, just update the final count for the one author on the author table. Then, build an index on the “AuthoredItems” column. Then, you can super simplify the query by doing