I’m trying to write a complex (at least, for my level of knowledge) string but I’m having one hell of a time.
Here’s the problem. I have two tables, one named t1 and one named c1.
The tables are defined as follow:
table T1:
e_id, char(8),
e_date, datetime,
e_status, varchar(2)
table C1:
e_id, char(8),
e_date, datetime,
e_status, varchar(2)
Each table contains a list of identifiers that may or may not be found in both tables (they may or may not be unique within each table), and associated statuses (can be ‘OK’ or ‘R’ in the T1 table, can be ‘OK’ or ‘C’ in the C1 table), and a datetime, e_date, associated with each occurence of e_id’s
I’m trying to write a query that will:
- Retrieve all the e_id values in the T1 table that have an
e_datethat is within the last 24 hours. - Retrieve all the occurences of the e_id’s that have occured within the last 24hrs (e_date is bigger than current time – 24h) in T1 within the last 30 days (e_date > now – 30 days), still within table T1 (eg: if e_id’s AAAAAAAA and BBBBBBBB are found in t1 with an e_date that is within the last 24 hours, retrieve all the occurences of e_id’s AAAAAAAA and BBBBBBBB in the same table but that have an e_date that is within the last 30 days)
- Append the count of
e_status = 'OK'for each specifice_idfound in the entireT1 tableto the row results - Append the count of
e_Status = 'OK'for each specifice_idfound in the entireC1 tableto the row results
I’ll do my best to write some sample data/results here. For clarity, I will disregard the tables datatypes. Assume the current date and time are 2012-Nov-08 19:00:00
T1:
- e_id: ‘A’, e_date: 2012-Nov-08 10:00:00, e_status: ‘OK’
- e_id: ‘A’, e_date: 2012-Nov-08 10:00:00, e_status: ‘R’
- e_id: ‘A’, e_date: 2012-Oct-15 10:00:00, e_status: ‘R’
- e_id: ‘B’, e_date: 2012-Oct-15 10:00:00, e_status: ‘OK’
- e_id: ‘A’, e_date: 2012-Oct-15 10:00:00, e_status: ‘OK’
- e_id: ‘A’, e_date: 2012-Oct-15 10:00:00, e_status: ‘R’
- e_id: ‘A’, e_date: 2012-Oct-15 10:00:00, e_status: ‘R’
- e_id: ‘A’, e_date: 2010-Jan-01 10:00:00, e_status: ‘R’
- e_id: ‘A’, e_date: 2010-Jan-01 10:00:00, e_status: ‘R’
C1:
- e_id: ‘A’, e_date: 2012-Oct-01 10:00:00, e_status: ‘C
- e_id: ‘B’, e_date: 2012-Oct-01 10:00:00, e_status: ‘OK’
- e_id: ‘A’, e_date: 2012-Oct-01 10:00:00, e_status: ‘C
- e_id: ‘B’, e_date: 2012-Oct-01 10:00:00, e_status: ‘OK’
- e_id: ‘A’, e_date: 2012-Oct-01 10:00:00, e_status: ‘OK’
Running the query would yield:
e_id, e_date, e_status, r_count, c_count
1. e_id: ‘A’, e_date: 2012-Nov-08 10:00:00, e_status: ‘OK’, r_count: 6, c_count: 2
2. e_id: ‘A’, e_date: 2012-Nov-08 10:00:00, e_status: ‘R’, r_count: 6, c_count: 2
3. e_id: ‘A’, e_date: 2012-Oct-15 10:00:00, e_status: ‘R’, r_count: 6, c_count: 2
4. e_id: ‘A’, e_date: 2012-Oct-15 10:00:00, e_status: ‘OK’, r_count: 6, c_count: 2
5. e_id: ‘A’, e_date: 2012-Oct-15 10:00:00, e_status: ‘R’, r_count: 6, c_count: 2
6. e_id: ‘A’, e_date: 2012-Oct-15 10:00:00, e_status: ‘R’, r_count: 6, c_count: 2
I am really sorry, I have had to change the date on T1 rows 3 to 7 (rows 3 4 5 6 of the results) as the values were erroneous.
T1’s Row 4 was not returned because no e_id: B was found in the last 24 hours
T1 Rows 8 and 9 were not returned because they were outside of the last 30 days
Time to do some TDQD — Test-Driven Query Design.
Rows in T1 from the last 24 hours
This will be a prevalent sub-query in the other parts of the query.
Rows in T1 from the last 30 days…
…where there was an entry in T1 within the last 24 hours.
We can add other columns as we need them.
Count of rows in T1 with status ‘R’ …
…where there was an entry in T1 within the last 24 hours
Count of rows in C1 with status ‘C’ …
…where there was an entry in T1 within the last 24 hours
Assemble the set of queries to produce the result
You probably could write the sub-queries without the 24 hour sub-sub-query, but it is likely to be effective to eliminate as many rows as soon as possible.
One advantage of the concept behind TDQD is that you can check interim results. There were some basically trivial syntax issues (in part because MySQL is not my primary DBMS), but the change from JOIN to LEFT JOIN for the two COUNT sub-queries is the sort of thing you’re apt to spot as you assemble the query. Trying to get everything right first time is — hard, if not futile. But the step-by-step build-up can give you confidence in what you’ve done. I’d never build a query as complex as this from scratch without testing the component sub-queries.
Thanks for the (minor) updates, FatalMojo.