This seems so basic, I’m flabbergasted for lack of a better word. I have two tables, let’s call them albums and artists
CREATE TABLE `albums` (
`album_id` bigint(20) NOT NULL AUTO_INCREMENT,
`artist_id` bigint(20) DEFAULT NULL,
`name` varchar(200) NOT NULL,
PRIMARY KEY (`album_id`)
)
CREATE TABLE `artists` (
`artist_id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(250) NOT NULL,
PRIMARY KEY (`artist_id`)
)
There are a few hundred thousand reconds in each table. Some of the album rows have a null artist_id, this is expected.
However, when I perform the following query to find artists without albums:
SELECT * FROM artists WHERE artist_id NOT IN (SELECT artist_id FROM albums)
… the query returns zero results. I know that this is not true. So I tried this one:
SELECT * FROM artists WHERE artist_id NOT IN (SELECT artist_id FROM albums WHERE artist_id IS NOT NULL)
… and I get back a couple thousand rows. My question is: Why did the first query seem to operate on the idea that any number = NULL? Or is this an odd effect that NULL has on the IN() statement? I feel like this is something basic that I’ve missed. I don’t usually use NULL in my db tables at all.
This is why
NOT EXISTSis semantically correctLogic:
NOT IN (x, y, NULL)is actuallyNOT (x OR y OR NULL)is actually(NOT x) AND (NOT y) AND (NOT NULL)So
NULLinvalidates the wholeNOT IN