As a more general case of this question because I think it may be of interest to more people…What’s the best way to perform a fulltext search on two tables? Assume there are three tables, one for programs (with submitter_id) and one each for tags and descriptions with object_id: foreign keys referring to records in programs. We want the submitter_id of programs with certain text in their tags OR descriptions. We have to use MATCH AGAINST for reasons that I won’t go into here. Don’t get hung up on that aspect.
programs id submitter_id tags_programs object_id text descriptions_programs object_id text
The following works and executes in a 20ms or so:
SELECT p.submitter_id FROM programs p WHERE p.id IN (SELECT t.object_id FROM titles_programs t WHERE MATCH (t.text) AGAINST ('china') UNION ALL SELECT d.object_id FROM descriptions_programs d WHERE MATCH (d.text) AGAINST ('china'))
but I tried to rewrite this as a JOIN as follows and it runs for a very long time. I have to kill it after 60 seconds.
SELECT p.id FROM descriptions_programs d, tags_programs t, programs p WHERE (d.object_id=p.id AND MATCH (d.text) AGAINST ('china')) OR (t.object_id=p.id AND MATCH (t.text) AGAINST ('china'))
Just out of curiosity I replaced the OR with AND. That also runs in s few milliseconds, but it’s not what I need. What’s wrong with the above second query? I can live with the UNION and subselects, but I’d like to understand.
Join after the filters (e.g. join the results), don’t try to join and then filter.
The reason is that you lose use of your fulltext index.
Clarification in response to the comment: I’m using the word join generically here, not as
JOINbut as a synonym for merge or combine.I’m essentially saying you should use the first (faster) query, or something like it. The reason it’s faster is that each of the subqueries is sufficiently uncluttered that the db can use that table’s full text index to do the select very quickly. Joining the two (presumably much smaller) result sets (with
UNION) is also fast. This means the whole thing is fast.The slow version winds up walking through lots of data testing it to see if it’s what you want, rather than quickly winnowing the data down and only searching through rows you are likely to actually want.