I have a select statement which is infact a subquery within a larger select statement built up programmatically. The problem is if I elect to include this subquery it acts as a bottle neck and the whole query becomes painfully slow.
An example of the data is as follows:
Payment
.Receipt_no|.Person |.Payment_date|.Type|.Reversed|
2|John |01/02/2001 |PA | |
1|John |01/02/2001 |GX | |
3|David |15/04/2003 |PA | |
6|Mike |26/07/2002 |PA |R |
5|John |01/01/2001 |PA | |
4|Mike |13/05/2000 |GX | |
8|Mike |27/11/2004 |PA | |
7|David |05/12/2003 |PA |R |
9|David |15/04/2003 |PA | |
The subquery is as follows :
select Payment.Person,
Payment.amount
from Payment
inner join (Select min([min_Receipt].Person) 'Person',
min([min_Receipt].Receipt_no) 'Receipt_no'
from Payment [min_Receipt]
inner join (select min(Person) 'Person',
min(Payment_date) 'Payment_date'
from Payment
where Payment.reversed != 'R' and Payment.Type != 'GX'
group by Payment.Person) [min_date]
on [min_date].Person= [min_Receipt].Person and [min_date].Payment_date = [min_Receipt].Payment_date
where [min_Receipt].reversed != 'R' and [min_Receipt].Type != 'GX'
group by [min_Receipt].Person) [1stPayment]
on [1stPayment].Receipt_no = Payment.Receipt_no
This retrieves the first payment of each person by .Payment_date (ascending), .Receipt_no (ascending) where .type is not ‘GX’ and .Reversed is not ‘R’. As Follows:
Payment
.Receipt_No|.Person|.Payment_date
5|John |01/01/2001
3|David |15/04/2003
8|Mike |27/11/2004
Following Ahmads post –
From the following results
(3|David |15/04/2003)
and (9|David |15/04/2003)
I would only want the record with the lowest receipt_no. So
(3|David |15/04/2003)
So I added the aggregate function ‘min(Payment.receipt_no)’ grouping by person.
Query 1.
select min(Payment.Person) 'Person',
min(Payment.receipt_no) 'receipt_no'
from
Payment a
where
a.type<>'GX' and (a.reversed not in ('R') or a.reversed is null)
and a.payment_date =
(select min(payment_date) from Payment i
where i.Person=a.Person and i.type <> 'GX'
and (i.reversed not in ('R') or i.reversed is null))
group by a.Person
I added this as a subquery within my much larger query, however it still ran very slowly. So I tried rewriting the query whilst trying to avoid the use of aggregate functions and came up with the following.
Query 2.
SELECT
receipt_no,
person,
payment_date,
amount
FROM
payment a
WHERE
receipt_no IN
(SELECT
top 1 i.receipt_no
FROM
payment i
WHERE
(i.reversed NOT IN ('R') OR i.reversed IS NULL)
AND i.type<>'GX'
AND i.person = a.person
ORDER BY i.payment_date DESC, i.receipt_no ASC)
Which I wouldn’t necessarily think as more efficient. In fact if I run the two queries side by side on my larger data set Query 1. completes in a matter of milliseconds where as Query 2. takes several seconds.
However if I then add them as subqueries within a much larger query, the larger query completes in hours using Query 1. and completes in 40 seconds using Query 2.
I can only attribute this to the use of aggregate functions in one and not the other.
Following a comment from CodeReview –
I’ve also re-written the query using the Rank() command as suggested.
Query 3.
This method has also resulted in the speeding up the larger query, the larger query now taking some 28 seconds
However Rank() is only supprted from SQL 2005 upwards.