Can someone please help me understand the following behavior that occurs when I add a WHERE clause to a query that has a LEFT JOIN with COUNT(*)?
I have two tables:
TABLE 1: customers
customer_id | name
------------------
1 | Bob
2 | James
3 | Fred
TABLE 2: orders
order_id | customer_id | order_timestamp
----------------------------------------
1000 | 1 | 2011-01-01 00:00
1001 | 1 | 2011-01-05 00:00
1002 | 2 | 2011-01-10 00:00
Now the following query tells me how many orders each customer placed:
select c.customer_id, count(o.order_id)
from customers c
left join orders o using (customer_id)
group by 1
customer_id | count
-------------------
1 | 2
2 | 1
3 | 0
This works great BUT if I add a WHERE clause to the query, the query no longer outputs counts of zeroes for customers who did not place any orders even though I’m doing a LEFT JOIN:
select c.customer_id, count(o.order_id)
from customers c
left join orders o using (customer_id)
where o.order_timestamp >= '2011-01-05'
group by 1
customer_id | count
-------------------
1 | 1
2 | 1
Now if I move the WHERE condition as part of the LEFT JOIN like the following, I get back my zero counts for customers who did not place orders:
select c.customer_id, count(o.order_id)
from customers c
left join orders o on (c.customer_id = o.customer_id) and (o.order_timestamp >= '2011-01-05')
group by 1
I’m confused at why the second query does not work, but the third one does? Can someone please provide me with an explanation? Also not sure if this matters, but I’m using postgres. Thanks!
This is because NULL is not greater than or equal to anything; If you change your WHERE clause to
where o.order_timestamp is null or o.order_timestamp >= '2011-01-05'then you will get the same behavior as your join clause limit.Note though – I would recommend the join clause approach, as it matches more closely what you are trying to do. Also the change to the WHERE clause I mentioned above will only work if the order_timestamp column is not nullable — if it is then you should use a different column for the null check (eg,
where o.primarykey is null or o.order_timestamp >= '2011-01-05').