I have two tables in MySQL that I’m comparing with the following attributes:
tbl_fac : facility_id, chemical_id, criteria
10 , 25 , 50
10 , 26 , 60
10 , 27 , 60
11 , 25 , 30
11 , 27 , 31
etc...
tbl_samp: sample_id, chemical_id, result
5 , 25 , 51
5 , 26 , 61
6 , 25 , 51
6 , 26 , 61
6 , 27 , 500
etc....
These tables are joined by chemical_id (many-to-many—- ugh), and there are several thousand facility_id’s, and several hundred chemical_id’s for each facility_id. There are also several thousand sample_id’s, each with several hundred chemical_id’s for each sample_id. All-in-all, there are around 500,000 records in tbl_fac, and 1,000,000+ records in tbl_samp.
I’m trying to extract three groups of sample_id’s from this dataset:
Group 1: any sample_id where tbl_samp.result > tbl_fac.criteria (i.e., result exceeds criteria)
Group 2: any sample_id where tbl_samp.result < tbl_fac.criteria, AND all tbl_fac.chemical_id’s are present for that sample_id (i.e., result is less than criteria, and everything is there)
Group 3: any sample_id where tbl_samp.result < tbl_fac.criteria, BUT one or more tbl_fac.chemical_id’s are missing in the sample_id (i.e., result is less than criteria, but something is missing)
Here’s the Question: How do I get all three Groups efficiently in one query?
I’ve tried:
select *
from tbl_fac
left join tbl_samp
on tbl_fac.chemical_id = tbl_samp.chemical_id
But this only yields values that are missing for the entire dataset (not the individual samples). I have a hackish query working that uses a third table to join tbl_fac and tbl_samp, but it is so ugly I’m actually embarrassed to post it….
As always, many thanks in advance for your thoughts on this one!
Cheers,
Josh
EDIT: Ideally, I would like the sample_id and Group returned — with just one Group per sample ID (my knowledge of the data indicates that they will always fall into one of the three categories above).
This answer makes the assumption that there is a unique constraint on
facility_idandchemical_idintbl_facand a unique constraint onsample_idandchemical_idintbl_samp. What I did was build up the query one step at a time. Whether this is efficient remains to be seen.Group 1: any sample_id where tbl_samp.result > tbl_fac.criteria (i.e., result exceeds criteria)
Group 2: any sample_id where tbl_samp.result < tbl_fac.criteria, AND all tbl_fac.chemical_id’s are present for that sample_id (i.e., result is less than criteria, and everything is there)
Group 3: any sample_id where tbl_samp.result < tbl_fac.criteria, BUT one or more tbl_fac.chemical_id’s are missing in the sample_id (i.e., result is less than criteria, but something is missing)
And finally, you union all three queries together and get: