I’m having some trouble with a complex query involving the following tables. Assume time is using the built-in sqlite timestamp datatype.
I am trying to return the customers whose 2nd purchase is within 4 hours of their first purchase AND if it’s within 2 hours it must be from a different store.
I’m having trouble wrapping my head around how to refer to the specific rows to compare a first purchase with a second purchase.
purchases
purchase_id | customer_id | store_id | purchase_time
1 1 1 2009-01-27 10:00:00.0
2 1 2 2009-01-27 10:30:00.0
3 2 1 2009-01-27 10:00:00.0
4 2 1 2009-01-27 10:30:00.0
5 3 1 2009-01-27 10:00:00.0
6 3 2 2009-01-27 16:00:00.0
7 4 3 2009-01-27 10:00:00.0
8 4 3 2009-01-27 13:00:00.0
stores
store_id | misc columns...
1
2
3
customers
customer_id | f_name
1 name1
2 name2
3 name3
4 name4
The correct return would be name1, name4 in this case.
You’re going to be joining the purchase table to itself, and then selecting on one of the two criteria.
The only real trick here is to formulate the different time criteria as:
store_id.Both of which obviously apply for the same
customer_id.So, we’ve got:
Which joins purchases to itself by customer_id, first checks that you’re comparing earlier purchases to later purchases, and then applies the two different criteria in the criteria that are
ORed.I find the time comparisons easiest to do with the
addtime()and then comparing the results. Others may prefer other ways.SQL Fiddle here: http://sqlfiddle.com/#!2/14dda/2
Results:
—
EDIT: Perhaps, you’d get some efficiency by moving the
p1.purchase_time < p2.purchase_timeup into thejoinclause. This might be faster with lots of data, though the execution plans for this little amount of data are identical. You’d like the optimizer to eliminate all those cases wherep1.purchase_time > p2.purchase_timebefore doing the more expensive comparisons. But that’s somewhat beyond the basic question of ways to write this query.