Background
My program is storing a series of objects, a set of tags, and the many-to-many associations between tags and objects, in a MySQL database. To give you an idea of the structure:
CREATE TABLE objects (
object_id INT PRIMARY KEY,
...
);
CREATE TABLE tags (
tag_name VARCHAR(32) NOT NULL
);
CREATE TABLE object_tags (
object_id INT NOT NULL,
tag_name VARCHAR(32) NOT NULL,
PRIMARY KEY (object_id, tag_name)
);
Problem
I want to be able to query for all objects that are tagged with all of the tags in a given set. As an example, let’s say I have a live tree, a dead flower, an orangutan, and a ship as my objects, and I want to query all of those tagged living and plant. I expect to receive a list containing only the tree, assuming the tags match the characteristics of the objects.
Current Solution
Presently, given a list of tags T1, T2, …, Tn, I am solving the problem as follows:
- Select all
object_idcolumns from theobject_tagstable wheretag_nameis T1. - Join the result of (1) with the
object_tagstable, and select allobject_idcolumns wheretag_nameis T2. - Join the result of (2) with the
object_tagstable again, and select allobject_idcolumns wheretag_nameis T3. - Repeat as necessary for T4, …, Tn.
- Join the result of (4) with the
objectstable and select the additional columns of the objects that are needed.
In practice (using Java), I start with the query string for the first tag, then prepend/append the string parts for the second tag, and so on in a loop, before finally prepending/appending the string parts that make up the overall query. Only then does the string actually get passed into a PreparedStatement and get executed on the server.
Edit: Expanding on my example from above, using this solution I would issue the following query:
SELECT object_id FROM object_tags JOIN (
SELECT object_id FROM object_tags WHERE tag_name='living'
) AS _temp USING (object_id) WHERE tag_name='plant';
Question
Is there a better solution to this problem? Although the number of tags is not likely to be large, I am concerned about the performance of this solution, especially as the database grows in size. Furthermore, it is very difficult to read and maintain the code, especially when the additional concerns/constraints of the application are thrown in.
I am open to suggestions at any level, although the languages (MySQL and Java) are not variables at this point.
I don’t know about the performance of this solution, but you can simplify by using pattern matching in MySql to match a set of pipe-delimited tags (or any delimiter). This is a solution I’ve used before for similar applications with tag tables (@match would be a variable passed in by your Java code, I’ve harded coded a value for demonstration):
Here is a working demo: http://sqlize.com/0vP6DgQh0j