I have reached an impasse with my knowledge regarding mysql joins, and the query

Question

0

Asked: June 8, 20262026-06-08T16:33:37+00:00 2026-06-08T16:33:37+00:00

I have reached an impasse with my knowledge regarding mysql joins, and the query

0

I have reached an impasse with my knowledge regarding mysql joins, and the query I’m trying to execute is taking way too long… Although I’m only a short while into learning mysql on my own, I have put time into reading about the mechanics of indexes and joins, done many google searches and tried a few different query formats. To no avail, I need help please.

Firstly, I will say that my database is, at the moment, to be optimized for speed of select queries. I know I have a few too many indexes… my theory of learning mysql is to make a few too many indexes and examine what the mysql optimizer chooses for my purposes (determined by using explain) and then determine why it has chosen said index.

Anyhow, I have four tables: table1, table2, table3, table4…

table1.ID1 is the primary key, and other data in table1 might be divided into multiple content in table2.
table2.ID1 identifies every entry in table1 that is built upon content form table1
table2.ID2 is the primary key for table2
table3.ID2 identifies every entry in table3 that is built upon content form table2
table3.ID3 is the primary key for table3
table4.ID3 identifies every entry in table4 that is built upon content form table3

Not every entry in table1 has corresponding data in table2, and similarly table2 to table3, and table3 to table4.

What I need to do is retrieve the distinct values of ID2 that appear within a date range, and also only if the table2 content eventually appears in table4. The challenge I’m facing is that only table1 has a date column, and I need only entries that also appear in table4.

The following query takes approx 2 minutes.

select table2.ID2 from table1 
left join table2 on
table1.ID1 = table2.ID1
left join table3 on
table3.ID2 = table2.ID2 
left join table4 on
table4.ID3 = table3.ID3
where table1.Date between "2012-03-11" and "2012-03-18

by using explain with the above query I see no reason why it should take so long.

+----+-------------+--------------+-------+----------------------+----------+---------+------------------------------+-------+--------------------------+
| id | select_type | table        | type  | possible_keys        | key      | key_len | ref                          | rows  | Extra                    |
+----+-------------+--------------+-------+----------------------+----------+---------+------------------------------+-------+--------------------------+
|  1 | SIMPLE      | table1       | range | ...                  | Datekey  | 9       | NULL                         | 17528 | Using where; Using index |
|  1 | SIMPLE      | table2       | ref   | ...                  | ID1key   | 8       | mydata.table1.POSTID         |     1 |                          |
|  1 | SIMPLE      | table3       | ref   | ...                  | ID2key   | 8       | mydata.table2.SrcID          |    20 |                          |
|  1 | SIMPLE      | table4       | ref   | ...                  | ID3key   | 8       | mydata.table3.ParsedID       |    10 | Using index              |
+----+-------------+--------------+-------+----------------------+----------+---------+------------------------------+-------+--------------------------+

I’ve replaced the names of possible keys with ‘…’ as its not that important. In any case, a key is selected.

Moreover, the number of rows in the resultset in the query is much more than the purported matching 17528 rows in the explain resultset. How could it be more??

What am I doing wrong? I’ve also tried inner join with no luck. The way I interpret my query is a 4-way venn diagram, with very few number of rows with overlapping criteria, and further optimized by an index on the daterange.

I at least get the resultset that i want if I add ‘distinct(table2.ID2)’, but why am I otherwise getting a resultset much longer than what I’d expect, and why is it taking so long?

Sorry if any part of my question has been ambiguous, I’d be happy to clarify as needed.

Thanks,
Brian

EDIT:

All indexes refer to a BIGINT column, as I expect my database to get rather large and need quite a number of unique row identifiers… perhaps bigint is overkill and reducing the size of that column and/or the index would speed things up further.

Here’s my final solution, based on the accepted answer below:

select ID2 from table2
where exists
    (select 1 from table1 r
    where table1.Date between "2012-03-11" and "2012-03-18" and table2.ID1 = table1.ID1
    )
and exists
    (select 1 from table3
    where exists 
        (select 1 from table4 where table4.ID3 = table3.ID3) 
    )

Additionally, I realized I was missing a multi-field index, associating table2.ID1 and table2.ID2… After adding this index, this statement returns in about 11 seconds, and returns approx 20,000 rows.

I think this is reasonable considering the number of rows in each of my tables
table1: ~480,000
table2: ~480,000
table3: ~6,000,000
table4: ~60,000,000

Does this sound efficient? I’ll accept the answer after I get confirmation this is the best performance I should expect. I’m running on a Xeon 3GHz system with 3gb mem, ubuntu 12.04, mysql 5.5.24

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T16:33:39+00:00

In all likelihood, your tables have multiple matches between them. Say table1 matches 5 rows in table2 and 10 rows in table3. Then you end up with 50 rows in the output.

So solve this, you need to limit your joins to one row per table.

One way is to use the in clause. If you are using the joins for filtering, then you can use a where clause instead:

where table2.id1 in (select table1.id1 from table1)

The “in” prevents duplicates.

The other alternative is to pre-aggregate the queries in the joins by doing joins.

Mysql seems to prefer a slightly different construct for the where clause, from an optimization perspective:

where exists (select 1 from table1 where table1.id = table2.id)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have reached an impasse with my knowledge regarding mysql joins, and the query

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply