I am using a MySQL database, here’s my situation:
I need a select query to be able to get a list of projects that can be completed using N supplies, where N is an array of supplies. This list of projects must include all projects that can be completed using any or all of N supplies, but can not include any projects that require supplies not listed in N. (e.g. in the make sketch project from the tables below paper has no substitute; however, pencil can be substituted by pen. If the query searches for projects that can be completed using pencil, pen, and pencil sharpener then ‘make sketch’ should not return as a project that can be completed, even though it uses some of the supplies listed)
Additionally, some of the supplies required by certain projects can be substituted by other supplies; however, just because one project can use a substitute supply item does not mean that another project would work with that same substitute. (e.g. in the sharpen pencil project below pen cannot be a substitute for pencil, however, for make drawing it can)
These are my tables:
Projects
+----+---------------------+
| id | name |
+----+---------------------+
| 1 | make sketch |
| 2 | sharpen pencil |
| 3 | make paper airplane |
+----+---------------------+
Supplies
+----+------------------+
| id | name |
+----+------------------+
| 1 | paper |
| 2 | pencil |
| 3 | pen |
| 4 | pencil sharpener |
+----+------------------+
ProjectSupplies
+----+-----------+------------+
| id | projectid | supplyid |
+----+-----------+------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 2 |
| 4 | 2 | 4 |
| 5 | 3 | 1 |
+----+-----------+------------+
SubstituteSupplies
+-------------------+------------+
| projectsuppliesid | supplyid |
+-------------------+------------+
| 2 | 3 |
+-------------------+------------+
The data isn’t exhaustive by any means but you should get the point.
This is the query I came up with previous to updating the database(see update below), however, it breaks the rules because the query result includes projects that require paper simply because it COUNT‘s both the supplyid and the substitute as two separate requirements rather than as simply fulfilling the same supply requirement.
SELECT projects.name FROM supplies
INNER JOIN projectsupplies ON supplyid = supplies.id OR substitute = supplies.id
INNER JOIN projects ON projects.id = projectid
WHERE supplies.id IN (2,3,4)
GROUP BY projects.name
HAVING COUNT(*) <= 3
ORDER BY projects.id
Is there a way to turn this:
INNER JOIN projectsupplies ON supplyid = supplies.id OR substitute = supplies.id
into essentially this:
INNER JOIN projectsupplies ON (supplies.id = supplyid) ? (supplies.id = supplyid) : (supplies.id = substitute)
or something similar to that using an if statement or whatever in order to make the query result correct?
One problem that I had been experiencing is that the above query will return ‘make sketch’ as a valid project even though, as specified in the query, there is no paper.
The end goal is to be able to accomplish this on a large scale with many projects and many supplies.
UPDATE: I found an issue in the design of my database that made it impossible to allow a supply to have more than one substitute. I corrected the problem to allow for many substitutes, and updated the tables above as necessary, so now the SELECT query above is no longer applicable. I still, however, need to accomplish the same goal that is mentioned at the top of this post
‘OR’ at the query level tends to be translated into UNION.
After the schema changed significantly
And plugging that into the bigger query:
(Note that at this stage, I’ve not verified whether the rest of the query is sound; I’ve only addressed how to get both supplies and substitute supplies into the joining operations.)
When run against IBM Informix Dynamic Server 11.70.FC2 on Mac OS X 10.7.5, the output from the sample data and the query above is:
Clearly, this is not correct; project 1 needs paper to be completed, but that is not one of the supplies available and there isn’t a substitute that’s available. So, the outer query is also not valid.
Fixing the main query
The projects that can be completed with a given supply list (here supplies 2, 3, 4) are those projects for which every necessary supply or substitute supply is in the list of supplies available. One gotcha is to ensure that if there is a substitute supply available but one non-substitutable supply is missing, the project is not completable.
So, for instance, Project 1 requires a supply of SupplyID 1 and either SupplyID 2 or in the alternative SupplyID 3; the fact that both 2 and 3 are available is not sufficient. In this example, there’s only one substitute, but in general, there could be many SupplyIDs that are needed, and many of them could have substitutes. So, considerable care is required.
Applying Test-Driven Query Design (TDQD)
When facing a complex query, I build it up step by step. Having found that original main query misses the mark, I’m going to have to go about building it step by step, and the result is moderately complex, but comprehensible because the steps are explained. There’s also a key design step — the clever bit of the algorithm — to come up with, but that comes with experience.
One criterion is that each project needs to have all the supplies it uses available. So, we need to know for each project how many different supplies it needs. This is easy:
Result
Now comes the magic ingredient: the ‘SupplyGroup’. The UNION query generated previously needs to be extended to include a SupplyGroup. The SupplyGroup corresponds to the ‘desired’ SupplyID in the ProjectSupplies table; the SupplyID is a SupplyID that will meet the project’s criterion for equivalence, and is either the same SupplyID from ProjectSupplies or is the SupplyID from SubstituteSupplies:
Result
Now we need to generate a list of the ProjectIDs and the SupplyGroups that can be satisfied from the list
(2, 3, 4)of available SupplyIDs:Result
And, in fact, we need to count the number of distinct supply groups that are available for each project from that list:
Result
Now we need to join the first query with the second on project ID and item count, and join that with the projects table to list the project name:
Result
And, given the data, I believe that is the correct result.
Before the schema changed significantly
The original version of the query was against a different table structure, where there was no SubstituteSupplies table and the ProjectSupplies table had an extra column
Substitutethat often contained a null but when it was not null, identified an alternative supply that would do. The question also lists(2,3,4,5)in the IN list, and the aggregate was compared with 4, not 3.You might well be able to do it with a UNION of two inner joins in a sub-select:
That needs to be plugged into your main query: