Here is a simplified version of my data: products: +—-+———–+ | id | name

Question

0

Asked: June 13, 20262026-06-13T12:25:19+00:00 2026-06-13T12:25:19+00:00

Here is a simplified version of my data: products: +—-+———–+ | id | name

0

Here is a simplified version of my data:

products:
+----+-----------+
| id | name      |
+----+-----------+
|  1 | Product X |
|  2 | Product Y |
|  3 | Product Z |
+----+-----------+

categories:
+----+---------------+
| id | name          |
+----+---------------+
|  1 | Hotel         |
|  2 | Accommodation |
+----+---------------+

category_product
+----+------------+-------------+
| id | product_id | category_id |
+----+------------+-------------+
|  1 |          1 |           1 |
|  2 |          1 |           2 |
|  3 |          2 |           1 |
|  4 |          3 |           2 |
+----+------------+-------------+

How do I construct an efficient query that will only retrieve products that have both categories “Hotel” and “Accommodation” related (eg. Product X)?

I first tried a join approach

SELECT *
FROM products p
JOIN category_product cp
ON p.id = cp.product_id
WHERE cp.category_id = 1 OR cp.category_id = 2

^ This doesn’t work because it doesn’t contrain the query to containing both.

I have found an approach using sub-queries that works… but I’ve been warned against sub-queries for performance reasons:

SELECT *
FROM products p
WHERE
(
    SELECT id
    FROM category_product
    WHERE product_id = p.id
    AND category_id = 1
)
AND
(
    SELECT id
    FROM category_product
    WHERE product_id = p.id
    AND category_id = 2
)

Are there any better solutions (or how about alternatives)? I’ve considered de-normalizing categories to an extra column on products but would ideally like to avoid that. Hoping for a magic bullet solution!

UPDATE

I’ve run some of the (great) solutions provided in the answers:
My data is 235 000 category_product rows and 58 000 products and obviously benchmarks are always dependent on environment and indexes etc.

“Relational division” @podiluska

2 categories: 2826 rows  ~ 20ms 
5 categories: 46 rows ~ 25-30 ms 
8 categories: 1 rows ~ 25-30 ms

“Where exists” @Tim Schmelter

2 categories: 2826 rows  ~ 5-7ms 
5 categories: 46 rows ~ 30 ms 
8 categories: 1 rows ~ 300 ms

One can see the results start to diverge with having a greater number of categories thrown in. I’ll look at using “relational division” as it provides consistent results but implementation might cause me to look at “where exists” too (long format http://pastebin.com/6NRX0QbJ)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T12:25:20+00:00

Editorial Team

2026-06-13T12:25:20+00:00Added an answer on June 13, 2026 at 12:25 pm

SELECT p.*
FROM products p
     inner join 
(
    select product_ID
    from category_product
    where category_id in (1,2)
    group by product_id
    having count(distinct category_id)=2
) pc
    on p.id = pc.product_id

This technique is called “relational division”

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Here is a simplified version of my data: products: +—-+———–+ | id | name

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply