I have a SELECT statement that works, and runs fast enough on my tables

Question

0

Asked: May 11, 20262026-05-11T05:55:54+00:00 2026-05-11T05:55:54+00:00

I have a SELECT statement that works, and runs fast enough on my tables

0

I have a SELECT statement that works, and runs fast enough on my tables (<0.01sec on 50k+ products, 3k+ categories). But in my mind it’s not very elegant and would like to hear any suggestions on making it better.

There are 3 tables of interest:

products – key productID
categories – key categoryID
products_tree – link table (categories contain many products, products can belong to many categories)

I have a list of excluded categoryIDs [e.g. 1040,1050,1168] I want to select all the productIDs that belong to one of these excluded categories only if the product doesn’t belong to another NON-excluded category

My Query looks like this:

SELECT DISTINCT productID  FROM products_tree  WHERE    categoryID IN (1040,1050,1168)    AND productID NOT IN     ( SELECT DISTINCT productID        FROM products_tree        WHERE        categoryID NOT IN (1040,1050,1168)     );

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T05:55:55+00:00

I can think of a few methods, each of which perform differently depending on indexes and your particular database implementation. Some that may look slow can be optimised in ways you may not have imagined and so it’s worth trialling them all and comparing execution plans to see what is happening…

Note1: I use GROUP BY rather than DISTINCT, this is because it allows the omptimiser to make use of indexes. I’ve seen implementations work out that they can turn the DISTINCT in to a GROUP BY, but it’s highly worth using GROUP BY in the fist place to be sure. It also gets you thinking about indexes, which is never a bad thing.

Note2: Some queries like this take a while to optimise, as there are many options for the optimiser to evaluate. It is therefore often worth compiling all the different options in to stored procedures and comparing the execution of those stored procedures. This ensures your compare actually Query Time and not different Compile Times.

SELECT    [tree].productID FROM    products_tree AS [tree] WHERE    [tree].productID IN (1040,1050,1168)    AND NOT EXISTS (SELECT * FROM products_tree WHERE productID = [tree].productID AND categoryID NOT IN (1040,1050,1168))  GROUP BY    [tree].productID

SELECT    [tree].productID FROM    products_tree AS [tree] LEFT OUTER JOIN    (       SELECT          productID       FROM          product_tree       WHERE          productID NOT IN (1040,1050,1168)       GROUP BY          productID     )     AS [ok_products]        ON [ok_products].productID = [tree].productID WHERE    [tree].productID IN (1040,1050,1168)    AND [ok_products].productID IS NULL  GROUP BY    [tree].productID

SELECT    [tree].productID FROM    products_tree AS [tree] GROUP BY    [tree].productID HAVING        MAX(CASE WHEN [tree].productID     IN (1040,1050,1168) THEN 1 ELSE 0 END) = 1    AND MAX(CASE WHEN [tree].productID NOT IN (1040,1050,1168) THEN 1 ELSE 0 END) = 0

There are others, and variations of each, but this should give you a very good start. But I really would stress the use of GROUP BY and the consideration to INDEXES 🙂

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a SELECT statement that works, and runs fast enough on my tables

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply