I hardly know how to state this question, let alone search for answers. But here’s my best shot. Assume I have a table
Col1 Col2
-----+-----
A | 1
A | 2
A | 3
A | 4
B | 1
B | 2
B | 3
C | 1
C | 2
C | 3
D | 1
I want to find the subset of associations (rows) where:
- There are no duplicates in Col1
- There are no duplicates in Col2
- Every value in Col1 is associated with a value in Col2
So the above example could yield this result
Col1 Col2
-----+-----
A | 4
B | 2
C | 3
D | 1
Notice that A-4 must be in the result because there are 4 unique letters and unique 4 numbers, so if you don’t associate A to 4, there’s no subset remaining that doesn’t map every value in Col1 while retaining the uniqueness of Col2.
Also, notice that it would be equally valid to replace B-2 and C-3 with B-3 and C-2. I don’t care which subset is selected, but I want one that fulfills all the requirements.
Not every set of data will have a sub-set that fulfills all the requirements, but I want to get as close as possible.
I’m trying to do this with a SQL query. I had a query that seemed to accomplish this for one set of data, but then I had to rewrite it for a slightly different set (where Col2 is actually a pair of columns) and could not reproduce my earlier success. My first solution used Min() and Group By and a couple Joins on aggregated results to mark duplicates for elimination in a loop until there was nothing left to safely eliminate. My more recent solution replaces the Group By queries with ROW_NUMBER() expressions that use PARTITION_BY. But I can’t figure out how to handle the cases where there are multiple valid result sets from multiply-cross-linked pairs like B and C in the above example. My earlier query might have handled it, but I can’t quite comprehend what I did (must have had a good day when I wrote that one). Perhaps I need to do a JOIN on the ROW_NUMBER expressions in my sub-queries? My brain gave out for today. I hope someone can help me find an ingeniously simple solution.
It seems to me that you’re aiming for something that SQL is not strong enough for. This is a non-standard algorithmic task, and I think you need a real programming language to achieve it. Your task reminds me of chess riddles.