Sorry for the length, wanted to give a complete description! I have a need

Question

0

Asked: June 1, 20262026-06-01T13:43:07+00:00 2026-06-01T13:43:07+00:00

Sorry for the length, wanted to give a complete description! I have a need

0

Sorry for the length, wanted to give a complete description! I have a need to show a report displaying some info about an id from another table and when someone changes countries from a country and within x amount of days. Note how i can have the same country entry in the table multiple times for an id (as the info is queried at regular intervals multiple times, but they may not have moved during that time), and can also have different country entries (as they change countries).

Quick explanation of the data:
i have the table below:

CREATE TABLE IF NOT EXISTS `country` (
`id` mediumint(8) unsigned NOT NULL,
`timestamp` datetime NOT NULL,
`country` varchar(64) DEFAULT NULL,
PRIMARY KEY (`id`,`timestamp`),
KEY `country` (`country`),
KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

and the entrys are like this:

41352   2012-03-26 15:46:01     Jamaica
41352   2012-03-05 22:49:41     Jamaican Applicant
41352   2012-02-26 15:46:01     Jamaica
41352   2012-02-16 12:11:19     Jamaica
41352   2012-02-05 23:00:30     Jamaican Applicant

This table has about ~214,590 total rows right now, but will have millions once the test data is replaced with real data.

What I want is some info on everyone who has left x country since y time. Here is how I would like it outputted assuming it was run on the data above:

id  name    last    country     TIMESTAMP   o_timestamp
41352 Sweet Mercy   Jamaica     2012-03-26 15:46:01     2012-03-05 22:49:41
41352 Sweet Mercy   Jamaica     2012-02-16 12:11:19     2012-02-05 23:00:30

Where o_timestamp is newer then a certain date (lets say 100), country is where they moved to, and the old country (not shown) they came from is whatever i pass into the query (Jamaican Applicant based on above data).

I developed the following query to satisfy the requirements and was using a certain id to test:

SELECT a.id,
       c.name,
       c.last,
       a.country,
       a.timestamp,
       b.timestamp AS o_timestamp
FROM   country a
       INNER JOIN user_info c
         ON ( a.id = c.id )
       LEFT JOIN country AS b
         ON ( a.id = b.id
              AND a.timestamp != b.timestamp
              AND a.country != b.country )
WHERE  b.timestamp = (SELECT c.timestamp
                      FROM   country c
                      WHERE  a.id = c.id
                             AND a.timestamp > c.timestamp
                      ORDER  BY c.timestamp DESC
                      LIMIT  1) 
       AND a.id = 965

I got this to complete in ( 7 total, Query took 0.0050 sec)

and a explain extended revealed the following:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    filtered    Extra
1   PRIMARY     c   const   PRIMARY     PRIMARY     3   const   1   100.00  
1   PRIMARY     a   ref     PRIMARY     PRIMARY     3   const   16  100.00  
1   PRIMARY     b   eq_ref  PRIMARY,timestamp   PRIMARY     11  const,func  1   100.00  Using where
2   DEPENDENT SUBQUERY  c   index   PRIMARY,timestamp   timestamp   8   NULL    1   700.00  Using where; Using index

so i figured I was pretty good and popped in this:

SELECT a.id,
       c.name,
       c.last,
       a.country,
       a.timestamp,
       b.timestamp AS o_timestamp
FROM   country a
       INNER JOIN user_info c
         ON ( a.id = c.id )
       LEFT JOIN country AS b
         ON ( a.id = b.id
              AND a.timestamp != b.timestamp
              AND a.country != b.country )
WHERE  b.timestamp = (SELECT c.timestamp
                      FROM   country c
                      WHERE  a.id = c.id
                             AND a.timestamp > c.timestamp
                      ORDER  BY c.timestamp DESC
                      LIMIT  1) 
       AND b.country = "whatever" AND timestamp > DATE_SUB(NOW(), INTERVAL 7 DAY)

This query took an amazing 6 minutes and 54 seconds to complete on a country that had 200 records and never completed (after going out for the afternoon and night and

coming home so a total of about 8 hours) for a country with 9000 records in the db. In real data, a country could be in there 10000 times easy. 100k would not be unreasonable.

So i do explain extended, and get this:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    filtered    Extra
1   PRIMARY     <derived2>  ALL     NULL    NULL    NULL    NULL    3003    100.00  
1   PRIMARY     c   eq_ref  PRIMARY     PRIMARY     3   b.id    1   100.00  
1   PRIMARY     a   ref     PRIMARY     PRIMARY     3   b.id    7   100.00  Using where
3   DEPENDENT SUBQUERY  c   index   PRIMARY,timestamp   timestamp   8   NULL    1   700.00  Using where; Using index
2   DERIVED     country     range   country,timestamp   country     195     NULL    474     100.00  Using where; Using index

So it looks larger, but not unreasonably so.

[Removed config variables for space, let me know if needed and also the performance info since its prob a query thing.]

Let me know if i missed anything.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T13:43:08+00:00

The problem isn’t adding a criterion; it is dropping one that’s doing the damage. In the original query, you had:

AND a.id = 965

This means that the query execution does not need to read the entire a (country) table. In your second, performance-killed query, you change that criterion to:

AND b.country = "whatever"
AND timestamp > DATE_SUB(NOW(), INTERVAL 7 DAY)

You no longer have a really restrictive criterion on a, so things work much more slowly.

Things get more complex when it is realized that b is another reference to country. Nevertheless, the change from a condition on a to b (where b is on the outer side of an outer join) is not trivial; it takes a lot longer to deal with the query conditions.

Does that mean because I’m not looking for a specific id, I’m out of luck?

With the given query structure, the answer seems to be ‘yes’, but the given query structure may be, shall we say, sub-optimal.

Your ‘fast enough when working on one ID’ query is:

SELECT a.id,
       c.name,
       c.last,
       a.country,
       a.timestamp,
       b.timestamp AS o_timestamp
FROM   country a
       INNER JOIN user_info c
         ON ( a.id = c.id )
       LEFT JOIN country AS b
         ON ( a.id = b.id
              AND a.timestamp != b.timestamp
              AND a.country != b.country )
WHERE  b.timestamp = (SELECT c.timestamp
                      FROM   country c
                      WHERE  a.id = c.id
                             AND a.timestamp > c.timestamp
                      ORDER  BY c.timestamp DESC
                      LIMIT  1) 
       AND a.id = 965

I don’t fully understand this query and what it is attempting to do. You need to be aware that outer joins are more expensive than inner joins, and conditions on the outer-joined table like

b.timestamp = (...correlated sub-query...)

are fiendishly expensive. One problem is that there might be a NULL in the b columns including timestamp, but the sub-query is wasted on that because the condition won’t be satisfied unless the values are non-null, so we end up wondering ‘why an OUTER join’?

When you added the revised condition, you should have received an ‘ambiguous column name’ error since that time stamp could be from a or c. Also, the b.country = "whatever" condition is another that only makes sense when the b values are not null, so again, the OUTER join is dubious.

As I understand it, the country table contains records about who entered which country and when. Also, FWIW, I’m tolerably certain that the join with the user_info table is a negligible performance issue; the problem is all down to the three references to the country table.

Judging from some of the clarifications, you could build up the query incrementally, maybe something like this.

Find each pair of country records for the same id where the records are adjacent in time sequence, and the older of the pair is for a given country (‘Jamaica Applicant’) and the newer is for a different country.

The easy part of this is:

SELECT a.id, a.country, a.timestamp, b.country, b.timestamp
  FROM country AS a
  JOIN country AS b
    ON a.id = b.id
   AND b.timestamp > a.timestamp
   AND a.country = 'Jamaica Applicant'
   AND b.country != a.country

This does most of the job, but does not ensure adjacency for the entries. To do that, we have to insist there there is not record in country table for the same id in between (but not including) the two timestamps, a.timestamp and b.timestamp. That’s an extra NOT EXISTS condition:

SELECT a.id,
       a.country   AS o_country,
       a.timestamp AS o_timestamp,
       b.country   AS n_country,
       b.timestamp AS n_timestamp
  FROM country AS a
  JOIN country AS b
    ON a.id = b.id
   AND b.timestamp > a.timestamp
   AND a.country = 'Jamaica Applicant'
   AND b.country != a.country
 WHERE NOT EXISTS
       (SELECT *
          FROM country AS c
         WHERE c.timestamp > a.timestamp
           AND c.timestamp < b.timestamp
           AND c.id = a.id
       )

Note that BETWEEN AND notation is not suitable. It includes the end points in the range, but we explicitly need the end points excluded.

Given the list of country entries above, we now need to select just those rows where the … hmmm, well, what is the criterion? I think you get to choose, but the result can be joined with the user_info table easily:

SELECT e.id, u.name, u.last, e.o_country, e.o_timestamp, e.n_country, e_n_timestamp
  FROM (SELECT a.id,
               a.country   AS o_country,
               a.timestamp AS o_timestamp,
               b.country   AS n_country,
               b.timestamp AS n_timestamp
          FROM country AS a
          JOIN country AS b
            ON a.id = b.id
           AND b.timestamp > a.timestamp
           AND a.country = 'Jamaica Applicant'
           AND b.country != a.country
         WHERE NOT EXISTS
               (SELECT *
                  FROM country AS c
                 WHERE c.timestamp > a.timestamp
                   AND c.timestamp < b.timestamp
                   AND c.id = a.id
               )
       ) AS e
  JOIN user_info AS u ON e.id = u.id
 WHERE e.o_timestamp > DATE_SUB(NOW(), INTERVAL 7 DAY);

I’m not about to guarantee that the performance will be better (or even that it is syntactically correct; it hasn’t been past an SQL DBMS). But I think the complex query structure for getting the adjacent dates is neater and probably better performing than the original code. Note, in particular, that it does not use any outer join, (explicit) ordering or limit clauses. That should help.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Sorry for the length, wanted to give a complete description! I have a need

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply