I often find myself adding expressions in the group by clause that I am sure are unique. It sometimes turns out I am wrong – because of an error in my SQL or a mistaken assumption, and that expression is not really unique.
There are many cases when I would much rather this would generate a SQL error rather than expanding my result set silently and sometimes very subtly.
I would love to be able to do something like:
select product_id, unique description from product group by product_id
but obviously I can’t implement that myself – but something nearly as concise can be implemented with user defined aggregates on some databases.
Would a special aggregate that only allows one unique input value be generally helpful in all versions of SQL? If so, could such a thing be implemented now on most databases? null values should be considered just like any other value – unlike the way the built-in aggregate avg typically works. (I have added answers with ways of implementing this for postgres and Oracle.)
The following example is intended to show how the aggregate would be used, but is a simple case where it is obvious which expressions should be unique. Real usage would more likely be in larger queries where it is easier to make mistaken assumptions about uniqueness
tables:
product_id | description
------------+-------------
1 | anvil
2 | brick
3 | clay
4 | door
sale_id | product_id | cost
---------+------------+---------
1 | 1 | £100.00
2 | 1 | £101.00
3 | 1 | £102.00
4 | 2 | £3.00
5 | 2 | £3.00
6 | 2 | £3.00
7 | 3 | £24.00
8 | 3 | £25.00
queries:
> select * from product join sale using (product_id);
product_id | description | sale_id | cost
------------+-------------+---------+---------
1 | anvil | 1 | £100.00
1 | anvil | 2 | £101.00
1 | anvil | 3 | £102.00
2 | brick | 4 | £3.00
2 | brick | 5 | £3.00
2 | brick | 6 | £3.00
3 | clay | 7 | £24.00
3 | clay | 8 | £25.00
> select product_id, description, sum(cost)
from product join sale using (product_id)
group by product_id, description;
product_id | description | sum
------------+-------------+---------
2 | brick | £9.00
1 | anvil | £303.00
3 | clay | £49.00
> select product_id, solo(description), sum(cost)
from product join sale using (product_id)
group by product_id;
product_id | solo | sum
------------+-------+---------
1 | anvil | £303.00
3 | clay | £49.00
2 | brick | £9.00
error case:
> select solo(description) from product;
ERROR: This aggregate only allows one unique input
Here is my implementation for postgres (edited to treat
nullas a unique value too):example tables for testing: