Could anyone explain to me the difference between the SQL statements below? I can see there is a difference but I can’t nail down the exact conditions which can make them produce different results. By the way, I’m thinking the distinct clause doesn’t make a difference on the user.id field since all the ids are already unique. The purpose of the query is to count the number of unique (non-empty) last names. If the last name is empty, then count as unique.
I suppose the general case for this problem would be the use of an aggregate function within a case-when statement.
Count within Case-When:
SELECT
(case when (substr(u.name,40,40) <> ' ')
then count(distinct(substr(u.name,40,40)))
else count(u.id)
end) as "LAST_NAME",
FROM
users u
GROUP BY
substr(u.name,40,40)
Case-When within Count:
SELECT
count (distinct case when (substr(u.name,40,40) <> ' ')
then substr(u.name,40,40)
else to_char(u.id)
end) as "LAST_NAME",
FROM
users u
GROUP BY
substr(u.name,40,40)
If
user.idis aPRIMARY KEY, these queries are identical semantically, though they are likely to produce different execution plans.They will return
1for all non-empty last names, since you are counting distinct values of the group-by expession within its group, which, by definition, will be exactly one.For empty last names, the first query will essentialy return
COUNT(u.id)and the second one will returnCOUNT(DISTINCT TO_CHAR(u.id)), which, given thatu.idis unique, is the same.I believe you need to remove
GROUP BYfrom the second query: