My database is MS SQL 2008.
I”m basically merging some sets of data together from two or more databases to end up with one owner of a set of data possibly related by two fields.
Table
ID Name Code
1 Ben 1
2 Ben 1
3 Frank 1
4 Frank 2
5 Mark 2
6 Mary 3
7 Chuck 3
8 Rogue 10
9 Charles 11
The data is in no order, the “Parent” doesn’t matter, as long as there is one per group.
The children of the parent is a set of records that are related by having the same name, or same Code, or both. Each record can only appear once in the result IE. cant belong to more than one group.
Here is one possible result (hierarchy doesn’t have to be represented this way):
ID Name Code ParentID
1 Ben 1 NULL
2 Ben 1 1
3 Frank 1 1
4 Frank 2 1
5 Mark 2 1
6 Mary 3 NULL
7 Chuck 3 6
Record ID {1} is the parent of group 1 (picked because first of the common set)
{2} shares the same name so its included (also could be included because same Code)
{3} shares same code so its included
{4} shares same name as {3} so its included
{5} shares same code with {4} so its included
{6} and {7} share same code, so form a new group.
{8} and {9} are excluded from the result as there is no other common records.
I think ive come up with a solution at work which uses about 3 or 4 joins of this table on itself, and its rather convoluted. Any suggestions on how to tackle this? I sense maybe a use of recursive CTE’s but I cant wrap my brain around it.
I don’t think that a recursive CTE is going to work here. The query is based entirely upon sequential logic and there’s no logical “next set” for any given state because you can’t know where the stop point is without first scanning each row one-by-one; in other words, the results essentially need to be evaluated row-by-row. The purpose of using a recursive CTE is to be able to append sets; if you’re just appending rows then what you end up with is no better than a cursor.
I would actually use a CLR User-Defined Aggregate for something like this, because there is no good-performance pure SQL solution that I can think of, but if you need a pure SQL solution, here is one using regular (not recursive) CTEs and the windowing functions:
That gets the results you’re asking for. It could be written in a more compact form but I’ve tried to aim for readability.
(Addendum: We could use a recursive CTE and improve this significantly if it were known at the beginning that each and every name/code can only go under one “parent” – but that assumption is not documented anywhere, so we really have to assume worst-case.)