I’m working on a search engine for an online library, but I’m kind of stuck here. When searching for tags, OR searches (ie books with “tag1” OR “tag2”) work fine, but the AND searches are giving me some trouble.
The tables (and their columns) I use for this are:
books | book_id, other_info
tagmap | map_id, book_id, tag_id
tags | tag_id, tag_text
Since a bunch of other search options can be en/disabled by the user, the query is generated by PHP. When searching for books with the tags “tag1” AND “tag2”, the following query is generated:
SELECT DISTINCT b.book_id, b.other_info
FROM books b, tagmap tm, tags t
WHERE b.book_id = "NA"
OR ( (t.tag_text IN ("tag1", "tag2"))
AND tm.tag_id = t.tag_id
AND b.book_id = tm.book_id )
HAVING COUNT(tm.book_id)=2
The WHERE line (which doesn’t give any results) is there so that additional parameters may be strung to the query more easily. I know this can be handled a lot nicer, but for now that doesn’t matter.
When doing an OR search (same query but without the HAVING COUNT line), it returns the two books in the database that have either of those tags, but when searching for the one book in the database that has BOTH tags, it returns nothing.
What’s wrong with the query? Is this not the/a way to do it? What am I overlooking?
Thanks!
EDIT: As per request, the data from each table relating to the book that should be returned:
books table:
book_id 110
tagmap table:
book_id 110 110
tag_id 15 16
tags table:
tag_id 15 16
tag_text tag1 tag2
SOLUTION: All I had to do was include
GROUP BY b.book_id
before the HAVING COUNT line. Simple as that. The answer provided by taz is also worth looking into, especially if you’re aiming for optimising your search queries.
The comma separated list of tables in your FROM clause functions like an inner join, so your query is selecting all of the rows in the tagmaps table and the tags table that have the same tag ID, and of those rows, all of the rows from the books table and the tagmaps table that have the same book ID. The HAVING clause then requires that two rows be returned from that result set with the same book ID. There can only be one row in the books table with any given book ID (assuming book ID is the primary key of the books table), so this condition is never met.
What you want is a join without the books table. You are looking for the same book ID appearing twice in the results of the OR clauses (I believe), so you don’t want to join the books table with those results because that will ensure you can never have the same book ID in the results more than once.
Edit: conceptually, you are essentially combining two different things. You are looking for tags and tagmaps for the same book, and you are also getting the book info from each of those books. So you are actually pulling duplicate other_info data for every instance of the same book ID in the tagmaps table, and then using the distinct clause to reduce that duplicate data down to one row, because all you want is the book ID and other_info. I would consider using two queries or a subquery to do this. There may be other [better] ways as well. I’d have to play around with it to figure it out.
For starters, try