I need a query that quickly shows the articles within a particular module (a subset of articles) that a user has NOT uploaded a PDF for. The query I am using below takes about 37 seconds, given there are 300,000 articles in the Article table, and 6,000 articles in the Module.
SELECT *
FROM article a
INNER JOIN article_module_map amm ON amm.article=a.id
WHERE amm.module = 2 AND
a.id NOT IN (
SELECT afm.article
FROM article_file_map afm
INNER JOIN article_module_map amm ON amm.article = afm.article
WHERE afm.organization = 4 AND
amm.module = 2
)
What I am doing in the above query is first truncating the list of articles to the selected module, and then further truncating that list to the articles that are not in the subquery. The subquery is generating a list of the articles that an organization has already uploaded PDF’s for. Hence, the end result is a list of articles that an organization has not yet uploaded PDF’s for.
Help would be hugely appreciated, thanks in advance!
EDIT 2012/10/25
With @fthiella’s help, the below query ran in an astonishing 1.02 seconds, down from 37+ seconds!
SELECT a.* FROM (
SELECT article.* FROM article
INNER JOIN article_module_map
ON article.id = article_module_map.article
WHERE article_module_map.module = 2
) AS a
LEFT JOIN article_file_map
ON a.id = article_file_map.article
AND article_file_map.organization=4
WHERE article_file_map.id IS NULL
I am not sure that i can understand the logic and the structure of the tables correctly. This is my query:
I extract all of the articles that have a module 2. I then select those that organization 4 didn’t provide a file.
I used a LEFT JOIN instead of a subquery. In some circumstances this could be faster.
EDIT Thank you for your comment. I wasn’t sure it would run faster, but it surprises me that it is so much slower! Anyway, it was worth a try!
Now, out of curiosity, I would like to try all the combinations of LEFT/INNER JOIN and subquery, to see which one runs faster, eg:
maybe removing *, and I would like to see what changes between the condition on the WHERE clause and on the ON clause… anyway I think it doesn’t help much, you should concentrate on indexes now.
Indexes on keys/foreign key should be okay already, but what if you add an index on
article_module_map.moduleand/orarticle_file_map.organization?