Let’s say I have a list of primary keys, for each row one value needs updating. Is it better to run:
-- run 10,000 of these queries
UPDATE mytable SET myflag = 1 WHERE id = [each_id]
Or combine updates into batch queries like this:
-- run 100 of these queries, where the IN () list contains about 100 elements
UPDATE mytable SET myflag = 1 WHERE id IN (3, 4, 5, 9, 99, ... 7887 )
How about 100 queries with 100 IN () items?
Neither. In PostgreSQL I would instead:
I put so many IDs in my example to give a visual clue that 10000 IDs is a lot. The two ideas presented in the question would either:
have to parse the list and put together 10000 statements and send them to the server, which may very well take longer than the UPDATEs themselves.
have to search in a list (array) of 10000 items for each individual
idinmytablefor a matching id. Standard indexes can’t be used. This will be very slow. Performance degrades with the size ofmytable.An index on
mytable.idis all the presented alternative needs to outperform both variants by an order of magnitude.The CTE parses the array once (subquery works, too – MySQL has no CTEs) – and
unnest()is rather fast with that. Doing it all in one statement beats 10000 statements by an order of magnitude. Add another order of magnitude if those statements are run in individual transactions. Add another one if you should use individual sessions.Rare exceptions apply for databases with locking issues under heavy write load. Just benchmark as has been advised.
EXPLAIN ANALYZEis your friend in PostgreSQL.If the operation grows huge, and most of the table is updated and / or you are running low on disk space or RAM, it may still be a good idea to split the operation into several logical chunks – just not too many, find the sweet spot. Mostly to let HOT updates recycle table bloat from previous
UPDATEruns. Consider this related question.