I need to delete a huge chunk of my data in my production database, which runs about 100GB in size. If possible, i would like to minimize my downtime.
My selection criteria for deleting is likely to be
DELETE * FROM POSTING WHERE USER.ID=5 AND UPDATED_AT<100
What is the best way to delete it?
- Build an index?
- Write a sequential script that deletes via paginating through the rows 1000 at a time?
The best way is to delete incrementally by using LIMIT clause (by 10000 items), but do not apply ordering. This will allow MySQL to flush the results more often and the transtactions won’t be huge. You can easily do it with any programming language you have installed which has a connector to mysql. Be sure to commit after each statement.
An index will definitely help but building it will take a while on a 100 GB table as well (anyway it is worth creating, when you are going to reuse the index in future). By the way, your current query is incorrect because reference a table USER not listed here. You should be careful with the index, so that the optimizer might benefit from using it.