I have a voting system and I am trying to write a query for MySQL that will detect which votes are completed so than an email can be sent to the vote’s creator. Votes are complete when (1) their time runs out (already solved that one easily) or (2) when all of the voters have voted.
There are two tables relevant to this. The first table is “votes” where each vote is described and has a unique “vote_id”. The second table is “tickets”. At the vote’s creation, each participant has a ticket created (which has some authentication information). Each ticket has a “vote_id” field which corresponds to that in the “votes” table. So basically, as people vote their corresponding ticket is deleted from the tickets table. This means that the number of rows in “tickets” of a given “vote_id” corresponds to the number of people who didn’t vote.
At first I went to do something like this:
SELECT votes.vote_id
FROM votes, tickets
WHERE votes.vote_id=tickets.vote_id
AND (votes.completion_timestamp < NOW())
HAVING (COUNT(tickets.vote_id) = 0)
But then I realized that…because of the “votes.vote_id=tickets.vote_id” line…I would imagine that means that the votes that have no outstanding tickets would be being ignored. I can think of a lot of inefficient ways to do this, but I would imagine there is a way to do this in MySQL?
Generalized summary of question: Given two tables A and B with a common field F, how do I find all F in A that are not present in B?
To do this efficiently in MySQL requires a trick:
The SQL that you have is not quite right. The following version should work:
The use of EXISTS vs IN with a subquery is discussed extensively in MySQL documentation (http://dev.mysql.com/doc/refman/5.0/en/subquery-optimization-with-exists.html). The difference versus a left outer join would rest on two things. Join strategy and increase I/O.
I do not know if the JOIN strategy is difference for the left outer join. I speculate that it shouldn’t be worse than for the EXISTS version. The second point, though, is that the left outer join creates an output set that potentially multiplies the number of rows. The EXISTS version cannot do this.
After reading documentation, it is possible that the following would be more efficient yet:
The limit should short-circuit any evaluation beyond the first row encountered.