I have read that Joins are more efficient than subqueries, I have a query that is extremely slow and uses lots of subqueries, therefore I would like to improve it but do not know how.
I have the following tables:
People \\this table stores lists of individual people with the following fields
(
ID, \\Primary Key
aacode Text, \\represents a individual house
PERSNO number, \\represent the number of the person in the house e.g. person number 1
HRP number, \\the PERSNO of the Housing Reference Person (HRP) the "main" person in the house
DVHsize number, \\the number of people in the house
R01 number, \\the persons relationship to the person who is PERSNO=1
R02 number, \\the persons relationship to the person who is PERSNO=2
R03 number, \\the persons relationship to the person who is PERSNO=3
AgeCat text, \\the age range of the person e.g. 30-44
xMarSta number, \\representing the marital satus of the person
)
Relatives \\this table stores the possible R01 numbers and their text equivalents
(
ID Primary Key, \\all possible R01 values
Relationship text, \\meaning of the corisponding R01 values
)
xMarSta \\this table store the possible xMarSta values and their text equivalents
(
ID Primary Key \\all possible xMarSta values
Marital text, \\meaning of corresponding R01 values
)
The query is:
HsHld – the goal of this query is to produce for each house (i.e. each aacode) a text sting describing the house in the form [Marital][AgeCat][Relationship][AgeCat][Relationship][AgeCat] etc. So an output for a three person house might look like Married(30-44)Spouse(30-44)Child(1-4)
I know my current code for HsHld is terrible, but it is included below:
SELECT People.ID, People.aacode, People.PERSNO,
People.HRP, People.DVHsize, xMarSta.Marital,
[Marital] & " (" & [AgeCat] & ")" & [RAL2] & [RAge2] &
[RAL3] & [RAge3] & [RAL4] & [RAge4] & [RAL5] & [RAge5] &
[RAL6] & [RAge6] & [RAL7] & [RAge7] & [RAL8] & [RAge8] AS HsTyp,
(SELECT Fam2.R01 FROM People AS Fam2 WHERE Fam2.aacode = People.aacode
AND Fam2.PERSNO = 2) AS Rel2,
(SELECT Fam3.R01 FROM People AS Fam3 WHERE Fam3.aacode = People.aacode
AND Fam3.PERSNO = 3) AS Rel3,
Switch([Rel2] Is Null,Null,[Rel2]=-9,'DNA',[Rel2]=-8,'NoAns',
[Rel2]=1,'Spouse',[Rel2]=2,'Cohabitee',[Rel2]<7,'Child',
[Rel2]<10,'Parent',[Rel2]<15,'Sibling',[Rel2]=15,'Grandchild',
[Rel2]=16,'Grandparent',[Rel2]=17,'OtherRelative',
[Rel2]=20,'CivilPartner',True,'Other') AS RAL2,
Switch([Rel3] Is Null,Null,[Rel3]=-9,'DNA',[Rel3]=-8,'NoAns',
[Rel3]=1,'Spouse',[Rel3]=2,'Cohabitee',[Rel3]<7,'Child',
[Rel3]<10,'Parent',[Rel3]<15,'Sibling',[Rel3]=15,'Grandchild',
[Rel3]=16,'Grandparent',[Rel3]=17,'OtherRelative',
[Rel3]=20,'CivilPartner',True,'Other') AS RAL3,
(Select FAge2.AgeCat FROM People AS FAge2
WHERE FAge2.aacode = People.aacode
AND FAge2.PERSNO = 2
) AS RAge2,
(Select FAge3.AgeCat FROM People AS FAge3
WHERE FAge3.aacode = People.aacode AND FAge3.PERSNO = 3
) AS RAge3
FROM Relatives
RIGHT JOIN (xMarSta RIGHT JOIN People ON xMarSta.ID=People.xMarSta)
ON Relatives.ID=People.R01
WHERE (((People.HRP)=[People.PERSNO]))
ORDER BY People.aacode;
There are several key things that need to change.
- At the moment I can’t get a join from the Rel field to the Relatives
table to work, so I am using a Switch function called RAL there must
be a better way. - For simplicity in the post I have only included Rel2 & Rel3 etc but in the actual code it goes up to Rel13! So the problem of performance is even worse.
- I want to replace these subqueries with joins, but as the subquery
looks into another record in the same table I am unsure how to go
about this. - I’m very out of my depth with this, I know a little SQL but the
complexity of this problem is too much for my limited knowledge
The very first thing is that you have a relational situation but the table structure you have is using columns to represent relationships. This gives you the R01, R02, R03 … R13 columns on your table. Unfortunately you will not be able to change performance dramatically because your table structure is repetitive denormalized instead of relational. This means that your query will need all this repetitive code, as you mentioned repeating 13 times. That also means that your switch function can be replaced by a join but again will be repeated 13 times.
Right, now back to your query, you have multiple sub-selects on your query and you need to join the related tables on a left join on the FROM clause and use the new related alias on your select. now you will see on the example below that for each R01, R02 field you will have a Fam2, Fam3 relation and you will need to do this 13 times on your case, and for each one you need to link to the relatives table (as i did called Relat2, Relat3, etc). Now if you can change your database structure for a normalized structure, you could really simplify this query and use much simpler joins.
See if this one helps you understand the process: