I’ve got two tables with a many to one relationship which I’ll call Parent_Table and Child_Table (i.e. a parent has zero or more children, but children have exactly one parent). I need to count the number of parents who have at least one child that fulfills some condition. Which query is optimal?
Option 1 (pretty sure it’s not this one)
SELECT COUNT(DISTINCT(pt.ID))
FROM PARENT_TABLE pt
JOIN CHILD_TABLE ct
ON pt.ID = ct.PARENT_ID
WHERE <parent meets some condition>
AND <child meets some condition>
Option 2
SELECT COUNT(pt.ID)
FROM PARENT_TABLE pt
WHERE pt.ID in
(
SELECT ct.PARENT_ID
FROM CHILD_TABLE ct
WHERE <child meets condition>
)
AND <parent meets some condition>
Option 3 (my guess as the fastest)
SELECT COUNT(pt.ID)
FROM PARENT_TABLE pt
WHERE EXISTS
(
SELECT 1
FROM CHILD_TABLE ct
WHERE ct.PARENT_ID = pt.ID
AND <child meets condition>
)
AND <parent meets some condition>
Or is it something else entirely? Does it depend on the sizes of each table, or the complexity of the two conditions, or whether the data is sorted?
EDIT: Database is Oracle.
The first query is slow, the others should run fast on most DB’s.
Without knowing the DB it’s hard to say more:
But: count(*) is often faster than count(names_field) and never slower
count(distinct (afield)) is slow
That depends on the DB and the exact version of the DB.
Yes, that plays a big part
Possible
If you want a fast select, all fields used to join must be indexed.
And all fields used in a where clause must either be indexed or low-cardinality.