I have a MySQL table with 237 million rows. I want to process all of these rows and update them with new values.
I do have sequential ID’s, so I could just use a lot of select statements:
where id = '1'
where id = '2'
This is the method mentioned in Sequentially run through a MYSQL table with 1,000,000 records?.
But I’d like to know if there is a faster way using something like a cursor that would be used to sequentially read a big file without needing to load the full set into memory. The way I see it, a cursor would be much faster than running millions of select statements to get the data back in manageable chunks.
Ideally, you get the DBMS to do the work for you. You make the SQL statement so it runs solely in the database, not returning data to the application. All else apart, this saves the overhead of 237 million messages to the client and 237 million messages back to the server.
Whether this is feasible depends on the nature of the update:
idvalues be changed at all?If the
idvalues will never be changed, then you can arrange to partition the data into manageable subsets, for any flexible definition of ‘manageable’.You may need to consider transaction boundaries; can it all be done in a single transaction without blowing out the logs? If you do operations in subsets rather than as a single atomic transaction, what will you do if your driving process crashes at 197 million rows processed? Or the DBMS crashes at that point? How will you know where to resume operations to complete the processing?