Need to query a database for 12 million rows, process this data and then insert the filtered data into another database.
I can’t just do a SELECT * from the database for obvious reasons – far too much data would be returned for my program to handle, and also this is a live database (customer order details) and I can’t have the database crawl to a halt for 10 minutes while it runs my query.
I’m looking for inspiration on how to write this program. I have to process each row. I was thinking it might be best to get a count on the rows. Then grab X at a time, wait for Y seconds, and repeat, until the dataset is complete. This way I’m not overloading the database, and since X will be sufficiently small, will run nicely in memmory.
Other suggestions or feedback ?
A flat file or a snapshot are both ideal.
If a flat file does not suit or you do not have access to snapshots theny you could use a sequential id field or create a sequential id in a temp table and then iterate using that.
Something like
If there is no sequential id then you can create a temp table that holds the order like
then process like this
this way temp_table holds the record identifiers that still need to be processed.