I have a database populated with 1 million objects. Each object has a ‘tags’ field – set of integers.
For example:
object1: tags(1,3,4) object2: tags(2) object3: tags(3,4) object4: tags(5)
and so on.
Query parameter is a set on integers, lets try q(3,4,5)
object1 does not match ('1' not in '3,4,5') object2 does not match ('2' not in '3,4,5') object3 matches ('3 and 4' in '3,4,5' ) object4 matches ('5' in '3,4,5' )
How to select matched objects efficiently?
You’re making a common mistake in database design, by storing a comma-separated list of tag id’s. It’s not a surprise that performing efficient queries against this is a blocker for you.
What you need is to model the mapping between objects and tags in a separate table.
Insert one row for each object/tag pairing. Of course, this means you have several rows for each
object_id, but that’s okay.You can query for all objects that have tags 3,4,5:
But this matches object1, which you don’t want. You want to exclude objects that have other tags not in 3,4,5.