I’ve spent a lot of time optimizing this query but it’s starting to slow down with larger tables. I imagine these are probably the worst types of questions but I’m looking for some guidance. I’m not really at liberty to disclose the database schema so hopefully this is enough information. Thanks,
SELECT tblA.id, tblB.id, tblC.id, tblD.id
FROM tblA, tblB, tblC, tblD
INNER JOIN (SELECT max(tblB.id) AS xid
FROM tblB
WHERE tblB.rdd = 11305
GROUP BY tblB.index_id
ORDER BY NULL) AS rddx
ON tblB.id = rddx.xid
WHERE
tblA.id = tblB.index_id
AND tblC.name = tblD.s_type
AND tblD.name = tblA.s_name
GROUP BY tblA.s_name
ORDER BY NULL;
There is a one-to-many relationship between:
- tblA.id and tblB.index_id
- tblC.name and tblD.s_type
- tblD.name and tblA.s_name
+----+-------------+------------+--------+---------------+-----------+---------+------------------------------+-------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+---------------+-----------+---------+------------------------------+-------+------------------------------+ | 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 32568 | Using temporary | | 1 | PRIMARY | tblB | eq_ref | PRIMARY | PRIMARY | 8 | rddx.xid | 1 | | | 1 | PRIMARY | tblA | eq_ref | PRIMARY | PRIMARY | 8 | tblB.index_id | 1 | Using where | | 1 | PRIMARY | tblD | eq_ref | PRIMARY | PRIMARY | 22 | tblA.s_name | 1 | Using where | | 1 | PRIMARY | tblC | eq_ref | PRIMARY | PRIMARY | 22 | tblD.s_type | 1 | | | 2 | DERIVED | tblB | ref | rdd_idx | rdd_idx | 7 | | 65722 | Using where; Using temporary | +----+-------------+------------+--------+---------------+-----------+---------+------------------------------+-------+------------------------------+
I have updated the query using joins instead of the join within the WHERE clause. Also, by looking at it, as a developer, you can directly see the relationship between the tables.
A->B, A->D and D->C. Now, on table B where you want the highest ID based on the common “ID=Index_ID” AND the RDD = 11305 won’t require a complete sub-query. However, this has moved the “MAX()” to the upper portion of the field selection clause. I would ensure you have an index on tblB on (index_id, rdd). Finally, by doing STRAIGHT_JOIN will help enforce the order to run the query based on how specifically listed.
— EDIT FROM COMMENT —
It appears you are getting nulls from the tblB. This typically indicates a valid tblA record, but no tblB record by same ID that has an RDD = 11305. That said, it appears you are only concerned with those entries associated with 11305, so I’m adjusting the query accordingly. Please make sure you have an index on tblB based on the “RDD” column (at least in the first position in case multiple column index)
As you can see in this one, I’m pre-querying from table B only for 11305 entries and pre-grouping by the index_ID (as linked to tblA). This gives me one record per index where they will exist… From THIS result, I’m joining back to A, then directly back to B again, but based on that highest match ID found, then D and C as was before. So NOW, you can get any column from any of the tables and get proper record in question… There should be no NULL values left in this query.
Hopefully, I’ve clarified HOW I’m getting the pieces together for you.