Basically I’m trying to get a distinct count within this cubed result. But unfortuantly you cannot use Count(distinct(Field)) with cube and rollup (as stated here)
Here is what the Data Looks Like. (This is just a simple example I do expect duplicates in the Data)
Category1 Category2 ItemId
a b 1
a b 1
a a 1
a a 2
a c 1
a b 2
a b 3
a c 2
a a 1
a a 3
a c 4
Here is what I would like to do but it doesn’t work.
SELECT
Category1,
Category2,
Count(Distinct(ItemId))
FROM ItemList IL
GROUP BY
Category1,
Category2
WITH CUBE
I know that I can do a sub select like this to get the results I want:
SELECT
*,
(SELECT
Count(Distinct(ItemId))
FROM ItemList IL2
WHERE
(Q1.Category1 IS NULL OR Q1.Category1 IS NOT NULL AND Q1.Category1 = IL2.Category1)
AND
(Q1.Category2 IS NULL OR Q1.Category2 IS NOT NULL AND Q1.Category2 = IL2.Category2))
AS DistinctCountOfItems
FROM (SELECT
Category1,
Category2
FROM ItemList IL
GROUP BY
Category1,
Category2
WITH CUBE) Q1
But this runs slow when the result set is large due to the sub-select. Is there any other way to get a Distinct Count from a cubed result?
This is the result I want to see
Category1 Category2 DistinctCountOfItems
a a 3
a b 3
a c 3
a NULL 4
NULL NULL 4
NULL a 3
NULL b 3
NULL c 3
You should be able to clean up your “messy” answer like so:
Then the other option I came up with:
The efficiency may vary based on the indexes, number of columns being grouped, etc. For the test table I wrote, the sub-select and join (as opposed to the Unions) was slightly better. I don’t have access to a MSSQL 2000 instance at the moment (I tested on a 2005 instance), but I don’t think anything here is invalid.
UPDATE
An even better option, especially if you’re grouping on more than 2 columns (If you’re grouping on 8 columns the above code would require 256 join clauses to catch all null combinations!):