I’m trying to find all records in database that have exactly the same set of child-records, as provided. Does not sound very clear, but just hang-on, I’ll explain.
There is a table Barcode with fields Barcode and Document:
Barcode | Document
________|_________
A | ABC
A | CDE
A | EFG
B | XYZ
B | VWX
C | ABC
D | ABC
D | CDE
D | EFG
E | EFG
If you notice, barcodes A and D have exactly the same set of documents. Barcodes C and E are subsets of A and D
Then I have a set of documents coming in to the function, say the set is ABC, CDE, EFG. This comes in as a List. (Barcodes information is stored in SQL Server, retrieved via LINQ to SQL). For this set of documents I need to find all the matching barcodes: A and D. But ignore barcodes containing subsets of documents: C and E should be ignored.
At the moment I have recursive function that walks through all the documents, and filters them out by parts of the incoming set of documents. This provides me with with matching sets, but also includes subsets (like C and E ), then I filter out subsets.
I believe this is not the best solution for the problem and there must be more elegant solution. But I struggle to think of any other ways to do it.
Any suggestions?
p.s.
I hope the explanation is clear enough, I can provide the code I have, if somebody is masochistic enough -)
You can find every Barcode whose Document is contained in your list of inputs, group by Barcode, and select all the Barcodes whose count is equal to the number of Documents in your input list.
An alternate method if you will be doing this query often and the set of documents doesn’t change much, you can create an in-memory index of the set of barcodes for each document (
Dictionary<Document, ISet<Barcode>>). Then when you have a set of documents you need to find barcodes for, you can iterate over them, intersecting their sets.