I’m trying to run a query to select customer audience, but it should select the customers who didn’t get an email before. The email tracking comes from another table. This is the original query:
SELECT
c.customers_firstname,
c.customers_lastname,
o.orders_id,
o.customers_id,
c.customers_email_address
FROM
orders o,
customers c,
order_status s
WHERE
o.customers_id = c.customers_id
AND o.orders_id = s.orders_id
AND o.orders_status = s.orders_status_id
ORDER BY
o.orders_id ASC
Now, I need to check another table called tracking and see if the customer already exists in that table and if so, skip it.
This is what I’ve tried, but it doesn’t seem to work:
SELECT
c.customers_firstname,
c.customers_lastname,
o.orders_id,
o.customers_id,
c.customers_email_address
FROM
orders o,
customers c
INNER JOIN
tracking t
ON
c.customers_id = t.customers_id,
order_status s
WHERE
o.customers_id = c.customers_id
AND o.orders_id = s.orders_id
AND o.orders_status = s.orders_status_id
AND c.customers_id NOT LIKE t.customers_id
ORDER BY
o.orders_id ASC
What am I doing wrong? Or is there any way to do this better?
ADDED: I totally forgot one more important factor – tracking table has “module” column and I need results only from “contact” module. So, in other words, I need to filter out customers who already exist in the tracking table, but only if associated with contact module, not any other module.
This is equivalent to your original query:
Add an anti-join
To meet your specification, you can use an “anti-join” pattern. We can add this to the query, before the ORDER BY clause:
What that’s going to do is find all matching rows from the
trackingtable, based on thecustomers_id. For any rows that the query doesn’t find a matching row(s) in thetracking table, it will generate a dummy row fromtrackingwhich consists of all NULL values. (That’s one way of describing what an OUTER JOIN does.)The “trick” now is to throw out all the rows that matched. And we do that by checking for a NULL value of customers_id from the tracking table (in the WHERE clause). For a match, that column won’t be NULL. (The equals comparison in the join predicate guarantees us that.) So we know that if we get a NULL value for
t.customers_id, that there wasn’t a match.So, this query returns the specified result set:
Other approaches
There are other approaches, but the anti-join is frequently the best performer.
Some other options are a
NOT EXISTSpredicate and aNOT INpredicate. I can add those, though I expect those solutions will be provided in other answers before I get around to it.Starting with that first query (equivalent to the query in your question), we could also use a
NOT EXISTSpredicate. We’d add this before the ORDER BY clause:To use a
NOT INpredicate, again, add this before the ORDER BY clause:(You may have some guarantee that tracking.customers_id is not null, but in the more general case, it’s important that the subquery NOT return a NULL value, so we include a WHERE clause so that we have that guaranteed.)
With appropriate indexes, the anti-join pattern usually performs better than either the
NOT EXISTSor theNOT IN, but not always.