I would like to query a relational database if a set of items exists.
The data I am modeling are of the following form:
key1 = [ item1, item3, item5 ]
key2 = [ item2, item7 ]
key3 = [ item2, item3, item4, item5 ]
...
I am storing them in a table with the following schema
CREATE TABLE sets (key INTEGER, item INTEGER);
So for example, the following insert statements would insert the above three sets.
INSERT INTO sets VALUES ( key1, item1 );
INSERT INTO sets VALUES ( key1, item3 );
INSERT INTO sets VALUES ( key1, item5 );
INSERT INTO sets VALUES ( key2, item2 );
INSERT INTO sets VALUES ( key2, item7 );
INSERT INTO sets VALUES ( key3, item2 );
INSERT INTO sets VALUES ( key3, item3 );
INSERT INTO sets VALUES ( key3, item4 );
INSERT INTO sets VALUES ( key3, item5 );
Given a set of items, I would like the key associated with the set if it is stored in the table and NULL if it is not. Is it possible to do this with an sql query? If so, please provide details.
Details that may be relevant:
- I am primarily interested in the database design / query strategy, though I will eventually implement this in MySQL and preform the query from with in python using the mysql-python package.
- I have the freedom to restructure the database schema if a different layout would be more convenient for this type of query.
- Each set, if it exists is supposed to be unique.
- I am not interested in partial matches.
- The database scale is on the order of < 1000 sets each of which contains < 10 items each, so performance at this point is not a priority.
Thanks in advance.
I won’t comment on whether there is a better suited schema for doing this (it’s quite possible), but for a schema having columns
nameanditem, the following query should work. (mysql syntax)The idea is that we have all the set keys in
k, which we then join with the set item data insetsonce for each set item in the set we are searching for, three in this case. Each of the three inner joins with table aliasesi1,i2andi3filter out all set names that don’t contain the item searched for with that join. Finally, we have a left join withsetswith table aliasix, which brings in all the extra items in the set, that is, every item we were not searching for.ix.nameisNULLin the case that no extra items are found, which is exactly what we want, thus theWHEREclause. The query returns a row containing the set key if the set is found, no rows otherwise.Edit: The idea behind collapsars answer seems to be much better than mine, so here’s a bit shorter version of that with explanation.
The idea here is that subquery
s1selects the keys of all sets that contain items other that the ones we are looking for. Thus, when we left joinsetswiths1,s1.nameisNULLwhen the set only contains items we are searching for. We then group by set key and filter out any sets having the wrong number of items. We are then left with only sets which contain only items we are searching for and are of the correct length. Since sets can only contain an item once, there can only be one set satisfying that criteria, and that’s the one we’re looking for.Edit: It just dawned on me how to do this without the exclusion.
The first subquery finds the total count of items in each set and the second one finds out the count of matching items in each set. When
matches.countis 3, the set has all the items we’re looking for, and iftotals.countis also 3, the set doesn’t have any extra items.