I’ve been struggling with a query which selects from multiple tables. My original query was incredibly slow (53 seconds). From reading up, I’m now reasonably sure that I need to create an inner query to limit the data which is iterated over. But I’m not sure how to use the result of the subquery (inner query) when using more than 2 tables. Below are some dummy tables:
+-------+---------------------+------------+
| tr_id | tr_datecreated | tr_depart |
+-------+---------------------+------------+
| 1 | 2011-07-31 00:00:00 | 2011-08-20 |
| 2 | 2011-08-01 00:00:00 | 2011-08-30 |
| 3 | 2011-08-02 00:00:00 | 2011-09-01 |
+-------+---------------------+------------+
+------+--------+---------+---------+
| p_id | p_trid | p_name | p_lname |
+------+--------+---------+---------+
| 1 | 1 | Geoff | Thingy |
| 2 | 1 | Mildred | Thingy |
| 3 | 1 | Garry | Thingy |
| 4 | 2 | Linda | Doobrey |
| 5 | 2 | Kev | Doobrey |
| 6 | 3 | John | Wotsit |
| 7 | 3 | Jill | Wotsit |
+------+--------+---------+---------+
+------+--------+----------+
| h_id | h_trid | h_dest |
+------+--------+----------+
| 1 | 1 | France |
| 2 | 1 | Spain |
| 3 | 2 | Italy |
| 4 | 3 | Portugal |
+------+--------+----------+
I want to get a result such as:
+-------+---------------------+------------+---------+---------+----------+
| tr_id | tr_datecreated | tr_depart | p_name | p_lname | h_dest |
+-------+---------------------+------------+---------+---------+----------+
| 1 | 2011-07-31 00:00:00 | 2011-08-20 | Geoff | Thingy | France |
| 1 | 2011-07-31 00:00:00 | 2011-08-20 | Geoff | Thingy | Spain |
| 1 | 2011-07-31 00:00:00 | 2011-08-20 | Mildred | Thingy | France |
| 1 | 2011-07-31 00:00:00 | 2011-08-20 | Mildred | Thingy | Spain |
| 1 | 2011-07-31 00:00:00 | 2011-08-20 | Garry | Thingy | France |
| 1 | 2011-07-31 00:00:00 | 2011-08-20 | Garry | Thingy | Spain |
| 2 | 2011-08-01 00:00:00 | 2011-08-30 | Linda | Doobrey | Italy |
| 2 | 2011-08-01 00:00:00 | 2011-08-30 | Kev | Doobrey | Italy |
| 3 | 2011-08-02 00:00:00 | 2011-09-01 | John | Wotsit | Portugal |
| 3 | 2011-08-02 00:00:00 | 2011-09-01 | Jill | Wotsit | Portugal |
+-------+---------------------+------------+---------+---------+----------+
where we get a separate row for each person for each holiday destination.
My original effort was in the form of:
SELECT tr_id, tr_datecreated, tr_depart, p_name, p_lname, h_dest
FROM transaction, people, holiday
WHERE tr_id = p_trid
AND tr_id = h_trid
AND tr_datecreated >= "2010-12-12 00:00:00"
AND tr_datecreated <= "2012-12-12 00:00:00"
I think that this created a huge number of cross joins and the query ran very slowly.
Seeing as the tr_id is being referenced a number of times I wanted to do an inner query which reduced the number of rows that everything else was compared to.
So the inner query part will be:
SELECT tr_id WHERE tr_datecreated >= "2010-12-12 00:00:00"
AND tr_datecreated <= "2012-12-12 00:00:00"
How would I create my desired table which I would want to compare both the p_trid and the h_trid against the same inner query without running that inner query twice (if possible)?
Would inner joins help in this situation? (I have read through but haven’t fully absorbed it yet).
Grateful for any advice and suggestions here. The database is large and I need to be efficient.
Edit
Indexes:
tr_id, h_id and p_id are all primary keys
Result of EXPLAIN
+----+-------------+--------------+--------+---------------+---------+---------+---------------------+------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+---------------+---------+---------+---------------------+------+--------------------------------+
| 1 | SIMPLE | holiday | ALL | NULL | NULL | NULL | NULL | 4 | |
| 1 | SIMPLE | people | ALL | NULL | NULL | NULL | NULL | 7 | Using where; Using join buffer |
| 1 | SIMPLE | transactions | eq_ref | PRIMARY | PRIMARY | 4 | db.people.p_trid | 1 | Using where |
+----+-------------+--------------+--------+---------------+---------+---------+---------------------+------+--------------------------------+
I think that this should work. Let me know if it works.
Total Query
Inner Query
Edit: Subquery explanation
The subquery selects the id, date created, and depart columns from the transaction table for the date range that you listed above. The ‘t’ outside the right paren at the end of query lets you alias the inner query so you can use its data above. Also, where I have
'id','date', and'depart'inside the subquery is also aliasing. It lets you use those values without typing out the full column name.Hope this helped.