I know there must be a better way to do this and I’m brain dead today.
I have two tables :
Reference
Id Label
1 Apple
2 Banana
3 Cherry
Elements
Id ReferenceId P1 P2 Qty
1 1 1 2 8
2 2 2 3 14
3 1 3 2 1
4 3 2 1 6
5 3 1 2 3
I want to group these up primarily by (P1, P2) but independent of the order of P1 and P2 – so that (1,2) and (2,1) map to the same group. That’s fine.
The other part is I want to get the label that has the large sum(qty) for a given P1, P2 pair – in other words, I want the result set to be:
P1 P2 TotalQty MostRepresentativeLabel
1 2 17 Cherry
2 3 15 Banana
All I can come up with is this awful mess:
select endpoint1, endpoint2, totalTotal, mostRepresentativeLabelByQty from
(
select SUM(qty)as total
,case when (p1<p2) then p1 else p2 end as endpoint1
,case when (p1<p2) then p2 else p1 end as endpoint2
,reference.label as mostRepresentativeLabelByQty
from elements inner join reference on elements.fkId = reference.id
group by case when (p1<p2) then p1 else p2 end
,case when (p1<p2) then p2 else p1 end
,label
) a inner join
(
select MAX(total) as highestTotal, SUM(total) as totalTotal from
(
select SUM(qty)as total
,case when (p1<p2) then p1 else p2 end as endpoint1
,case when (p1<p2) then p2 else p1 end as endpoint2
,reference.label as mostRepresentativeLabelByQty
from elements inner join reference on elements.fkId = reference.id
group by case when (p1<p2) then p1 else p2 end
,case when (p1<p2) then p2 else p1 end
,label
) byLabel
group by endpoint1, endpoint2
) b
on a.total = b.highestTotal
Which .. works … but I’m not convinced. This ultimately is going to be running on much larger datasets (200,000 rows or so) so I’m not liking this approach – is there a simpler way to express “use the value from this column where some other column is maximized” that I’m totally blanking on?
(SQL Server 2008 R2 by the way)
I use the sum of the BINARY_CHECKSUM’s of P1 and P2 to uniquely identify each group. This SUM is identified
by the BC alias, and permits the grouping needed to find the largest group labels.
Result:
EDIT Wrap each BINARY_CHECKSUM in ABS to maximize the entropy of the sums of each group’s BINARY_CHECKSUM. Because BINARY_CHECKSUM is a signed BIGINT, this will decrease
the chances of a collision between two different groups where a positive BINARY_CHECKSUM is summed with
a negative BINARY_CHECKSUM.