Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8300431
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T16:33:37+00:00 2026-06-08T16:33:37+00:00

I have reached an impasse with my knowledge regarding mysql joins, and the query

  • 0

I have reached an impasse with my knowledge regarding mysql joins, and the query I’m trying to execute is taking way too long… Although I’m only a short while into learning mysql on my own, I have put time into reading about the mechanics of indexes and joins, done many google searches and tried a few different query formats. To no avail, I need help please.

Firstly, I will say that my database is, at the moment, to be optimized for speed of select queries. I know I have a few too many indexes… my theory of learning mysql is to make a few too many indexes and examine what the mysql optimizer chooses for my purposes (determined by using explain) and then determine why it has chosen said index.

Anyhow, I have four tables: table1, table2, table3, table4…

table1.ID1 is the primary key, and other data in table1 might be divided into multiple content in table2.
table2.ID1 identifies every entry in table1 that is built upon content form table1
table2.ID2 is the primary key for table2
table3.ID2 identifies every entry in table3 that is built upon content form table2
table3.ID3 is the primary key for table3
table4.ID3 identifies every entry in table4 that is built upon content form table3

Not every entry in table1 has corresponding data in table2, and similarly table2 to table3, and table3 to table4.

What I need to do is retrieve the distinct values of ID2 that appear within a date range, and also only if the table2 content eventually appears in table4. The challenge I’m facing is that only table1 has a date column, and I need only entries that also appear in table4.

The following query takes approx 2 minutes.

select table2.ID2 from table1 
left join table2 on
table1.ID1 = table2.ID1
left join table3 on
table3.ID2 = table2.ID2 
left join table4 on
table4.ID3 = table3.ID3
where table1.Date between "2012-03-11" and "2012-03-18

by using explain with the above query I see no reason why it should take so long.

+----+-------------+--------------+-------+----------------------+----------+---------+------------------------------+-------+--------------------------+
| id | select_type | table        | type  | possible_keys        | key      | key_len | ref                          | rows  | Extra                    |
+----+-------------+--------------+-------+----------------------+----------+---------+------------------------------+-------+--------------------------+
|  1 | SIMPLE      | table1       | range | ...                  | Datekey  | 9       | NULL                         | 17528 | Using where; Using index |
|  1 | SIMPLE      | table2       | ref   | ...                  | ID1key   | 8       | mydata.table1.POSTID         |     1 |                          |
|  1 | SIMPLE      | table3       | ref   | ...                  | ID2key   | 8       | mydata.table2.SrcID          |    20 |                          |
|  1 | SIMPLE      | table4       | ref   | ...                  | ID3key   | 8       | mydata.table3.ParsedID       |    10 | Using index              |
+----+-------------+--------------+-------+----------------------+----------+---------+------------------------------+-------+--------------------------+

I’ve replaced the names of possible keys with ‘…’ as its not that important. In any case, a key is selected.

Moreover, the number of rows in the resultset in the query is much more than the purported matching 17528 rows in the explain resultset. How could it be more??

What am I doing wrong? I’ve also tried inner join with no luck. The way I interpret my query is a 4-way venn diagram, with very few number of rows with overlapping criteria, and further optimized by an index on the daterange.

I at least get the resultset that i want if I add ‘distinct(table2.ID2)’, but why am I otherwise getting a resultset much longer than what I’d expect, and why is it taking so long?

Sorry if any part of my question has been ambiguous, I’d be happy to clarify as needed.

Thanks,
Brian

EDIT:

All indexes refer to a BIGINT column, as I expect my database to get rather large and need quite a number of unique row identifiers… perhaps bigint is overkill and reducing the size of that column and/or the index would speed things up further.

Here’s my final solution, based on the accepted answer below:

select ID2 from table2
where exists
    (select 1 from table1 r
    where table1.Date between "2012-03-11" and "2012-03-18" and table2.ID1 = table1.ID1
    )
and exists
    (select 1 from table3
    where exists 
        (select 1 from table4 where table4.ID3 = table3.ID3) 
    )

Additionally, I realized I was missing a multi-field index, associating table2.ID1 and table2.ID2… After adding this index, this statement returns in about 11 seconds, and returns approx 20,000 rows.

I think this is reasonable considering the number of rows in each of my tables
table1: ~480,000
table2: ~480,000
table3: ~6,000,000
table4: ~60,000,000

Does this sound efficient? I’ll accept the answer after I get confirmation this is the best performance I should expect. I’m running on a Xeon 3GHz system with 3gb mem, ubuntu 12.04, mysql 5.5.24

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T16:33:39+00:00Added an answer on June 8, 2026 at 4:33 pm

    In all likelihood, your tables have multiple matches between them. Say table1 matches 5 rows in table2 and 10 rows in table3. Then you end up with 50 rows in the output.

    So solve this, you need to limit your joins to one row per table.

    One way is to use the in clause. If you are using the joins for filtering, then you can use a where clause instead:

    where table2.id1 in (select table1.id1 from table1)
    

    The “in” prevents duplicates.

    The other alternative is to pre-aggregate the queries in the joins by doing joins.

    Mysql seems to prefer a slightly different construct for the where clause, from an optimization perspective:

    where exists (select 1 from table1 where table1.id = table2.id)
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Hi I have been trying to learn Javascript using codeacademy.com and I have reached
I think I have reached my current limits with Left Joins. Currently I have
Okay, I have reached a sort of an impasse. In my open source project,
I'm trying to follow the Sequelize tutorial on their website . I have reached
I am relatively new to IPhone coding, although I have reached the point of
What is the best way to know when I have reached the last object
I am trying to create a custom module in magento admin. I have reached
I've reached an impasse trying to get Eclipse configured for Lotus Notes 8.5 Plug-in
While working with macros, I have reached the point (I have been trying hard
hej guys, I am trying to solve this homework. I have reached a dead

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.