Is there a faster way to select the distinct count of users from a table? Perhaps using row_number, partitioning, or cross apply?
I just can’t think of it right now.
Example:
Table UsageLog
UserId Date StoreNumber
Alice 200901 342
Alice 200902 333
Alice 200902 112
Bob 200901 112
Bob 200902 345
Charlie 200903 322
Here’s my current query:
select count(distinct userID), date
from
UsageLog
where
date between 200901 and 200902
group by date
My actual table has millions of rows and all columns are actually integers.
Is there a faster way to get the list of users?
Edit:
I already have nonclustered indexes on all individual columns. For some reason, the execution plan shows that I am still doing a table scan. I guess I should create a clustered index on Date. I’ll see how that works…
SELECT DISTINCT() is the way to go. The problem is that you are hitting the
dateindex tipping point, so your plan goes for the clustered index scan instead. See the link for Kimberley L. Tripp article what a ‘tipping point’ is.You need a covering index:
Clustered index will also work, but has other side effects as well. If the clustered index on
dateis OK with the rest of your data access patterns, then is better than the covering index I propose.Update:
The reverse order index you tried on
(userID, date)also works, will range seek each userID. In fact is better than the(date, userID)or(date) INCLUDE (userID)because it returns the userIDs pre-sorted so the DISTINCT does not introduce the additional sort.Still I recommend going over the link I posted to understand why ‘index on each individual columns’ was not helping.