I’m having a time issue with looping through like 1 million potential rows from a database. I basically pull the rows into a DataTable and loop through them, but it is getting slow. What is the alternative out there? I can split this rows into chunks like 20,000 a piece. Can I use parallel processing in C#? Basically the code loops through every potential record that matches a certain query and tries to figure out if it is a legitimate entry. That is why every record needs to be individually visited. A record for a one object could reach 10 million rows. Approaches seem like parallel processing in multiple computers or PP in single machine with multiple cores, or some kind of data structure/approach change?
Any opinions, thoughts and guesses are helpful to make this fast and reasonable?
First off: Do not use
DataTablefor operations like these:DataTableis not parralized.So again: Do not use
DataTablefor operations like these.Instead use the
DataReader. This allows you to immediately start consuming/process the data, instead of waiting for it to be loaded. The simplest version would be (sample for MS SQL Server):The reader will be blocked while executing your processing code, meaning no more rows are read from the DB.
If the processing code is heavy, it might be worth it to use the
Tasklibrary to create Tasks that perform the check, which would enable you to make use of multiple cores. However, there is an overhead of creating aTask, if oneTaskdoes not contain enough ‘work’ you can batch a couple of rows together: