I’m trying to accomplish a query that requires a calculated column using a subquery that passes the date reference via a variable. I’m not sure if I’m not “doing it right” but essentially the query never finishes and spins for minutes on end. This is my query:
select @groupdate:=date_format(order_date,'%Y-%m'), count(distinct customer_email) as num_cust,
(
select count(distinct cev.customer_email) as num_prev
from _pj_cust_email_view cev
inner join _pj_cust_email_view as prev_purch on (prev_purch.order_date < @groupdate) and (cev.customer_email=prev_purch.customer_email)
where cev.order_date > @groupdate
) as prev_cust_count
from _pj_cust_email_view
group by @groupdate;
Subquery has an inner join accomplishes the self-join that only gives me the count of people that have purchased prior to the date in @groupdate. The EXPLAIN is below:
+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
| 1 | PRIMARY | _pj_cust_email_view | ALL | NULL | NULL | NULL | NULL | 140147 | Using temporary; Using filesort |
| 2 | UNCACHEABLE SUBQUERY | cev | ALL | IDX_EMAIL | NULL | NULL | NULL | 140147 | Using where |
| 2 | UNCACHEABLE SUBQUERY | prev_purch | ref | IDX_EMAIL | IDX_EMAIL | 768 | cart_A.cev.customer_email | 1 | Using where |
+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
And the structure of the table _pj_cust_email_view is as such:
'_pj_cust_email_view', 'CREATE TABLE `_pj_cust_email_view` (
`order_date` varchar(10) CHARACTER SET utf8 DEFAULT NULL,
`customer_email` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
KEY `IDX_EMAIL` (`customer_email`),
KEY `IDX_ORDERDATE` (`order_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
Again, as I said earlier, I’m not really sure that this is the best way to accomplish this. Any criticism, direction is appreciated!
Update
I’ve made a little progress, and I’m now doing the above procedurally by iterating through all known months instead of months in the database and setting the vars ahead of time. I don’t like this still. This is what I’ve got now:
Sets the user defined vars
set @startdate:='2010-08', @enddate:='2010-09';
Gets total distinct emails in the given range
select count(distinct customer_email) as num_cust
from _pj_cust_email_view
where order_date between @startdate and @enddate;
Gets the total count of customers who had purchased prior to the given range
select count(distinct cev.customer_email) as num_prev
from _pj_cust_email_view cev
inner join _pj_cust_email_view as prev_purch on (prev_purch.order_date < @startdate) and (cev.customer_email=prev_purch.customer_email)
where cev.order_date between @startdate and @enddate;
Where @startdate is set to the start of the month and @enddate signifies the end of that month’s range.
I really feel like this still can be done in one full query.
I don’t think you need to use subqueries at all, nor do you need to iterate over months.
Instead, I recommend you create a table to store all months. Even if you prepopulate it with 100 years of months, it would only have 1200 rows in it, which is trivial.
Store the actual start and end dates, so you can use the DATE data type and index the two columns properly.
edit: I think I understand your requirement a bit better, and I’ve cleaned up this answer. The following query may be right for you:
If you create the following compound index in your table:
Then the query has the best chance of being an index-only query, and will run a lot faster.
Below is the EXPLAIN optimization report from this query. Note
type: indexfor each table.Here’s some test data:
Here’s the result given that data, including the concatenated list of emails to make it easier to see.