I am running an SQL query which self-joins the same table 24 times in order to “look up” particular rows of the table according to 24 different criteria, so that I can use all those values in a calculation. While the performance is perfectly fine (the table is indexed and the join criteria are restrictive), I can’t help but feel there is a code smell here.
Is there a better way of doing lookups in SQL?
(Apologies for not including an example; I hope I have phrased the question in a general fashion).
Edit: attempting an example anyway:
CREATE TABLE key (
pk1 int,
pk2 int,
pk3 int,
PRIMARY KEY (pk1, pk2, pk3)
);
CREATE TABLE values (
pk1 int,
pk2 int,
pk3 int,
pk4 int,
pk5 int,
value int,
PRIMARY KEY (pk1, pk2, pk3, pk4, pk5)
);
SELECT k.pk1, k.pk2, k.pk3,
v1.value + v2.value - v3.value * (v4.value / v5.value) + ... + v24.value as result
FROM key k
LEFT JOIN values v1
on v1.pk1=k.pk1
and v1.pk2=k.pk2
and v1.pk3=k.pk3
and v1.pk4=100
and v1.pk5=200
LEFT JOIN values v2
on v2.pk1=k.pk1
and v2.pk2=k.pk2
and v2.pk3=k.pk3
and v2.pk4=400
and v2.pk5=800
...
LEFT JOIN values v24
on v24.pk1=k.pk1
and v24.pk2=k.pk2
and v24.pk3=k.pk3
and v24.pk4=900
and v24.pk5=700;
Edit 2: The reason for this structure is that the values table represents (mathematically speaking) a function of 5 variables, with pre-computed return values stored in the table for a variety of parameters.
To start with this isn’t a self-join at all.
A self-join is when a table is joined to itself.
Examples of this are parent-child relationships in hierarchies and people who have relationships to other people (literally parent, child).
The case you give of using a table in different roles is not that uncommon.
If the different values in the table are not related in some kind of essential nature, I would have a problem with the design as a case of the “one true lookup” where one stores a variety of entity-lookups with a type code – so you get billing addresses, customers, shipping addresses, products and all sorts of things all in the same lookup table.
In data warehouses, it is also possible to have dimensions used in different roles, particularly date or time dimensions.
A smell would be if the same lookup table was joined over and over for columns which are being used as an array – for instance first_child, second-child, third_child – since this is typically a violation of normalization.
My only concerns with what you have shown here are:
The magic numbers which appear to be used to pick a 3-dimensional space in the 5-dimensional space of all values. I assume these are themselves defined in a table somewhere (pk4, pk5, description).
At that point I would consider turning each into a view to make it more readable.
In SQL Server (or DB2, which has the same construct), I would actually consider using an inline table-valued function parameterized on pk4 and pk5 which would help a little to prevent someone from accidentally joining with incomplete join criteria – and ending up with one ITVF instead of many views.
But all this is simply clean up – the design of the query and tables seems pretty sound to me.