I have a table that gets updated very regularly throughout the day, so Im looking for the most scalable method for updating rows. These updates happen in large batches, so each update may include around 1000 rows.
Currently, I’m looping through each of these 1000 rows and running a single update query… while it doesn’t take long to execute, it just seems wasteful compared to one simple mass insert statement. So REPLACE INTO makes sense, since its basically deleting the old rows and inserting new ones, but how does that compare to a manual “delete where id in array” then mass insert? Exact same? Slightly different? Is there a better method?
The key here is that these aren’t single row queries but mass row queries. So the question is, what is the most scalable way to run these updates. I say “scalable” and not “fastest” because these updates happen at regular intervals on a production server with active users, so speed is important but not at the cost of locking up the server.
You want to use InnoDB for this instead of MyISAM. Why? Because when you’re performing bulk inserts and deletes, wrapping the entire thing in a transaction can be a huge performance boost.
No matter what you end up doing to the data, that change alone could be huge.
With an appropriate transaction isolation level, your users could continue using the table while you change everything about it, only seeing the changes once you commit, without worry about table locks.
With regard to the actual data update, avoid deletes. Deletes are slow. Do updates, and delete only things you need to update. Avoid the
REPLACE INTOmagic as well, as it does a delete before an insert.