I have a large data-set of emails sent and status-codes.
ID Recipient Date Status
1 someone@example.com 01/01/2010 1
2 someone@example.com 02/01/2010 1
3 them@example.com 01/01/2010 1
4 them@example.com 02/01/2010 2
5 them@example.com 03/01/2010 1
6 others@example.com 01/01/2010 1
7 others@example.com 02/01/2010 2
In this example:
- all emails sent to someone have a status of 1
- the middle email (by date) sent to them has a status of 2, but the latest is 1
- the last email sent to others has a status of 2
What I need to retrieve is a count of all emails sent to each person, and what the latest status code was.
The first part is fairly simple:
SELECT Recipient, Count(*) EmailCount
FROM Messages
GROUP BY Recipient
ORDER BY Recipient
Which gives me:
Recipient EmailCount
someone@example.com 2
them@example.com 3
others@example.com 2
How can I get the most recent status code too?
The end result should be:
Recipient EmailCount LastStatus
someone@example.com 2 1
them@example.com 3 1
others@example.com 2 2
Thanks.
(Server is Microsoft SQL Server 2008, query is being run through an OleDbConnection in .Net)
This is an example of a ‘max per group’ query. I think it is easiest to understand by splitting it up into two subqueries and then joining the results.
The first subquery is what you already have.
The second subquery uses the windowing function ROW_NUMBER to number the emails for each recipient starting with 1 for the most recent, then 2, 3, etc…
The results from the first query are then joined with the result from the second query that has row number 1, i.e. the most recent. Doing it this way guarantees that you will only get one row for each recipient in the case that there are ties.
Here is the query:
This gives the following results: