SELECT commandid
FROM results
WHERE NOT EXISTS (
SELECT *
FROM generate_series(0,119999)
WHERE generate_series = results.commandid
);
I have a column in results of type int but various tests failed and were not added to the table. I would like to create a query that returns a list of commandid that are not found in results. I thought the above query would do what I wanted. However, it does not even work if I use a range that is outside the expected possible range of commandid (like negative numbers).
Given sample data:
This works:
as does this alternative formulation:
Both of the above appear to result in identical query plans in my tests, but you should compare with your data on your database using
EXPLAIN ANALYZEto see which is best.Explanation
Note that instead of
NOT INI’ve usedNOT EXISTSwith a subquery in one formulation, and an ordinaryOUTER JOINin the other. It’s much easier for the DB server to optimise these and it avoids the confusing issues that can arise withNULLs inNOT IN.I initially favoured the
OUTER JOINformulation, but at least in 9.1 with my test data theNOT EXISTSform optimizes to the same plan.Both will perform better than the
NOT INformulation below when the series is large, as in your case.NOT INused to require Pg to do a linear search of theINlist for every tuple being tested, but examination of the query plan suggests Pg may be smart enough to hash it now. TheNOT EXISTS(transformed into aJOINby the query planner) and theJOINwork better.The
NOT INformulation is both confusing in the presence of NULLcommandids and can be inefficient:so I’d avoid it. With 1,000,000 rows the other two completed in 1.2 seconds and the
NOT INformulation ran CPU-bound until I got bored and cancelled it.