Given :
InsuranceCompanies (cid, name, phone, address)
Doctors (did, name, specialty, address, phone, age, cid)
Patients (pid, name, address, phone, age, gender, cid)
Visits (vid, did, pid, date, description)
where
cid - Insurance Company code did - doctor code pid - patient code vid - code of visit
and a TASK : For each doctor return the number of (different) patients of age 20-25:
is :
SELECT V.did, COUNT ( V.pid )
FROM ( SELECT DISTINCT V1.did, V1.pid
FROM Visits V1,Patient P
WHERE P.pid=V1.pid and P.age >= 20 and P.age <=25 ) AS V
GROUP BY V.did
equivalent to :
SELECT V.did, COUNT (DISTINCT V.pid )
FROM Visits V,Patient P
WHERE P.pid=V.pid and P.age >= 20 and P.age <=25
GROUP BY V.did
and are they both a good solution to the task?
The second example looks fine to me. When this is compiled into a plan, the RDBMS will work out how best to approach it from a number of algorithms. I don’t see the need to add the middle step you introduce in the first version.
If you are extremely keen to be sure you have the best approach, look at the plans generated and compare them. And look at reads, CPU time, etc.
How to do that depends on the particular RDBMS you are using.