I have a big loop that updates a 2,800,000 records Access Database, i divided the loop to 7 threads so each threads work on 400,000 records, the loop takes about 0.7 seconds to update a single record because there are alot of calculations to be done.
i am sure the threads will help to make the process much faster because i tested the application on 7200RPM HDD and SSD and a ramdisk and the speed difference is not really noticeable so IO is not the bottleneck.
i want the first thread to process the first 400k records and the second thread to process the next 400k records and so on.
-what would be the right way to do this?
-should each thread has its own datatable and binding source?
-how would you combine the results in one table and show it in a datagridview when the process is done?
An access database is simply a file. At the point of reading and writing to it, you are going to bottleneck, on top of that, there’s a significant chance of it being corrupted doing something like this. Imagine doing the same thing with an xml file of the data.
All depends on what you are doing to the data.
If there was no change to any columns in any keys or indexes, and not all records will read / changed. Then one thread to read and write and a pool to process might get you somewhere. The processing would have to be significant enough to make it worthwhile spinning up more than one thread though. As it is they are going to waiting on disk io unless there’s a significant amount of it.
If you have indexes that will changes and you don’t have to use them in the operation. Drop them, process then put them back again.
If you are making significant changes to the data then maybe
One read thread from the existing db
Then create seven (might want to adjust this based on a sensible number of processors)
empty database with just this table in it
Read from the parent, throw into a processor pool (if it it’s worth having one), then write to one of the “seven” copies
Then clear out the original and write the data back from the others (serially and put it back together)
All that said drop access, use a full dbms, because you would have probably seen some of the improvement you’d have expected.
Something to bear in mind when parallel processing. Where’s the bottleneck? In your case probably disk IO, multiple threads did not address that, you just ended up with seven threads twiddling their thumbs waiting for the disk drive.