I’m writing an app which needs to process a large text file (comma-separated with several different types of records – I do not have the power or inclination to change the data storage format). It reads in records (often all the records in the file sequentially, but not always), then the data for each record is passed off for some processing.
Right now this part of the application is single threaded (read a record, process it, read the next record, etc.) I’m thinking it might be more efficient to read records in a queue in one thread, and process them in another thread in small blocks or as they become available.
I have no idea how to start programming something like that, including the data structure that would be necessary or how to implement the multithreading properly. Can anyone give any pointers, or offer other suggestions about how I might improve performance here?
You might get a benefit if you can balance the time processing records against the time reading records; in which case you could use a producer/consumer setup, for example synchronized queue and a worker (or a few) dequeueing and processing. I might also be tempted to investigate parallel extensions; it is pertty easy to write an
IEnumerable<T>version of your reading code, after whichParallel.ForEach(or one of the otherParallelmethods) should actually do everything you want; for example: