This question reminded me of a couple related problems with whole-set comparison. Given:
- a
collectionof sets, and - a
probeset
Three questions:
- How do you find all sets in
collectionthat matchprobe, element for element? - How do you find all sets in
collectionthat match a collection ofprobes, without the use of explicit looping constructs? How do you join sets of sets? - Is this relational division? If not, what is it?
I have a decent solution to question 1 (see below).
I don’t have a decent relational solution to question 2. Any takers?
Test data:
IF OBJECT_ID('tempdb..#elements') IS NOT NULL DROP TABLE #elements
IF OBJECT_ID('tempdb..#sets') IS NOT NULL DROP TABLE #sets
CREATE TABLE #sets (set_no INT, PRIMARY KEY (set_no))
CREATE TABLE #elements (set_no INT, elem CHAR(1), PRIMARY KEY (set_no, elem))
INSERT #elements VALUES (1, 'A')
INSERT #elements VALUES (1, 'B')
INSERT #elements VALUES (1, 'C')
INSERT #elements VALUES (1, 'D')
INSERT #elements VALUES (1, 'E')
INSERT #elements VALUES (1, 'F')
INSERT #elements VALUES (2, 'A')
INSERT #elements VALUES (2, 'B')
INSERT #elements VALUES (2, 'C')
INSERT #elements VALUES (3, 'D')
INSERT #elements VALUES (3, 'E')
INSERT #elements VALUES (3, 'F')
INSERT #elements VALUES (4, 'B')
INSERT #elements VALUES (4, 'C')
INSERT #elements VALUES (4, 'F')
INSERT #elements VALUES (5, 'F')
INSERT #sets SELECT DISTINCT set_no FROM #elements
Setup and solution for question 1, set lookup:
IF OBJECT_ID('tempdb..#probe') IS NOT NULL DROP TABLE #probe
CREATE TABLE #probe (elem CHAR(1) PRIMARY KEY (elem))
INSERT #probe VALUES ('B')
INSERT #probe VALUES ('C')
INSERT #probe VALUES ('F')
-- I think this works.....upvotes for anyone who can demonstrate otherwise
SELECT set_no FROM #sets s
WHERE NOT EXISTS (
SELECT * FROM #elements i WHERE i.set_no = s.set_no AND NOT EXISTS (
SELECT * FROM #probe p WHERE p.elem = i.elem))
AND NOT EXISTS (
SELECT * FROM #probe p WHERE NOT EXISTS (
SELECT * FROM #elements i WHERE i.set_no = s.set_no AND i.elem = p.elem))
Setup for question 2, no solution:
IF OBJECT_ID('tempdb..#multi_probe') IS NOT NULL DROP TABLE #multi_probe
CREATE TABLE #multi_probe (probe_no INT, elem CHAR(1) PRIMARY KEY (probe_no, elem))
INSERT #multi_probe VALUES (1, 'B')
INSERT #multi_probe VALUES (1, 'C')
INSERT #multi_probe VALUES (1, 'F')
INSERT #multi_probe VALUES (2, 'C')
INSERT #multi_probe VALUES (2, 'F')
INSERT #multi_probe VALUES (3, 'A')
INSERT #multi_probe VALUES (3, 'B')
INSERT #multi_probe VALUES (3, 'C')
-- some magic here.....
-- result set:
-- probe_no | set_no
------------|--------
-- 1 | 4
-- 3 | 2
OK, let’s solve question 2 step by step:
(1) Inner join sets and probes on their individual elements. This way we’ll see how do test sets and probe sets relate (which sets have what elements in common with which probe):
Result:
(2) Count how many common elements between each test set and probe set (inner joins mean we already left the “no matches” aside)
Result:
(3) Bring the counts of the test set and probe set on each row (subqueries may not be the most elegant)
Result:
(4) Find the solution: only retain those test sets and probe sets that have the same number of elements AND this number is also the number of common elements, i.e. the test set and the probe set are identical
Result:
Excuse the
@s instead of#s, I like table variables better 🙂