so let’s say i’m building this contact management system. There is a USER table and a CONTACT_INFO table.
For every USER, I can have zero or more CONTACT_INFO records. The way I’ve defined it, I’ve setup a foreign key in my CONTACT_INFO table to point to the relevant USER record.
I would like do a search for all the USER records that do not have CONTACT_INFO records.
I expect that this could be done:
SELECT * FROM user u WHERE u.user_id NOT IN (SELECT DISTINCT c.user_id FROM CONTACT_INFO);
My concern is that as the tables grow, this query’s performance can degrade significantly.
One idea I’m playing with is to add a column in the USER table that says if it has any CONTACT_INFO records or not. Also, I was wondering, if upon inserting any record into CONTACT_INFO, the DBMS has to verify that the record exists, it would already be accessing that record for the sake of verification and so updating it, when I update a CONTACT_INFO record should not be that costly, performance-wise.
As always, feedback is appreciated.
From my tests, the following is faster than BradC’s method:
This may be because the compiler does have to do the conversion itself, I don’t know.
Le Dorfier is correct in principle, though: if you have set up your indexes right on the database (i.e. both user_id columns should be indexed), both your answer and most of these responses here will be extremely fast, regardless of how many records you have in your database.
Incidentally, if you’re looking for a way to get a query that lists users along with a “HasContactInfo” boolean value, you could do something like this:
This second solution may not be useful in your case, but I found it to be much faster than some simpler queries that I had assumed would get optimized automatically.