I have two tables in Sql Server, one containing IDs for files and the slides contained in those original files, and another for “sections” that can contain slides from one or more of the files, potentially in arbitrary order, duplicated, and/or with some slides eliminated.
Sample data looks like this:
FileSlide
FileID SlideID
214 716
214 717
214 718
223 770
223 771
223 772
223 773
223 774
223 775
SectionSlide
SectionID SlideID
527 716
527 718
527 717
527 770
527 773
527 774
527 775
527 774
I originally didn’t need a “SectionFile” relation, but now I do need that information to see which files were chosen for a particular section, regardless of slide details. My problem is examining the slide IDs between the SectionSlide and FileSlide tables to see whether there’s an overlap between the slides in any given File-Section pair. I would like to find all File-Section pairs that share slides.
For the sample data above, output would look like this:
SectionFileCandidates
SectionID FileID
527 214
527 223
What is the query to produce this output?
Is it possible to calculate a metric that indicates what proportion of the original file’s slides exists in the section?
For the sample data above, output would look like this:
SectionFileCandidates
SectionID FileID Overlap
527 214 1.00
527 223 0.67
…that is, 3 out of 3 slides from file 214 are in section 527, and 4 out of 6 slides from file 223 are in section 527.
I was originally trying to compare groups of rows using the OVER (PARTITION BY ...) clause, but could not figure it out.
How can I do these two queries?
Both queries are possible!
First query:
or
Second query: