Suppose master_table contains many records and both the “id” field of the master_table, tableA,tableB,tableC and tableD are the same in the business sense.
For the 2 select statements shown belows ,
Will they both return the same result set?
Which one will have better performance ?
I think if both tableA_tmp ,tableB_tmp,tableC_tmp and tableD_tmp return a smaller result set , SQL1 will be faster than SQL2 because oracle does not need to query tableA_tmp,,tableB_tmp,tableC_tmp and tableD_tmp once for every master_table record.
But if both the tableA_tmp ,tableB_tmp,tableC_tmp and tableD_tmp return the large result set , SQL 2 will be much faster because the cost of joining many large result set is much higher than query tableA_tmp,,tableB_tmp,tableC_tmp and tableD_tmp once for every master_table record.
Please correct me if I have any misunderstanding. Or any others method suggested?
SQL1:
select
master_table.* ,
tableA_tmp.cnt as tableA_cnt ,
tableB_tmp.cnt as tableB_cnt ,
tableC_tmp.cnt as tableC_cnt ,
tableD_tmp.cnt as tableD_cnt
from
master_table,
(select tableA.id as id, count(1) as cnt from tableA group by tableA.id) tableA_tmp,
(select tableB.id as id, count(1) as cnt from tableB group by tableB.id) tableB_tmp,
(select tableC.id as id, count(1) as cnt from tableC group by tableC.id) tableC_tmp,
(select tableD.id as id, count(1) as cnt from tableD group by tableD.id) tableD_tmp
where
master_table.id = tableA_tmp.id(+) and
master_table.id = tableB_tmp.id(+) and
master_table.id = tableC_tmp.id(+) and
master_table.id = tableD_tmp.id(+) ;
SQL 2:
select
master_table.* ,
(select count(*) from tableA where tableA.id = master_table.id) as tableA_cnt,
(select count(*) from tableB where tableB.id = master_table.id) as tableB_cnt,
(select count(*) from tableC where tableC.id = master_table.id) as tableC_cnt,
(select count(*) from tableD where tableD.id = master_table.id) as tableD_cnt
from
master_table;
Joins are generally better than inline queries – inline queries get executed for every row that is returned from the main query.
That means (1) is better than (2). In 99% of the cases at least.
In few cases, the distribution of data and way indexes are defined can play a role in tilting the query execution times towards 2 being more efficient, but this happens very rarely in a average database.