I have a small application that process a large quantity of (relatively small) files. It runs sequentially: it loads data from a file, perform operations on it, and move to the next file.
I noticed that during run time, the CPU usage is not 100%, and I guess this is due to the time taken by the I/O operations on the hard drive.
So the idea would be to load the next data in memory in parallel with the processing of the current data, using a separate thread (the data in question would simply be a sequence of int, stored in a vector). This seems a very common problem, but I have a hard time finding a simple, plain C++ example to do that!
And now C++0x is on its way, a simple demo code using the new thread facility, with no external library, would be very nice.
Also, although I know this depends on a lot of things, is it possible to have an educated guess on the benefits (or setbacks) of such an approach, in respect to the size of the data file to load for example? I guess that with large files, the disk I/O operations are very seldom anyway, since the data is already buffered (with fstream(?))
Olivier
A toy program on how to use some C++0x threading and synchronization facilities. No idea on what the performance of this (I recommend Matt’s answer), my focus is on clarity and correctness for the sake of making an example.
The files are read separately, as you requested. They’re not converted to a sequence of
inthowever, as I feel this is more related to processing rather than strict I/O. So the files are dumped into a plainstd::string.Some notes:
std::stringcontainer are all logically related. You may as well replace them with a thread-safe container/channelstd::asyncinstead ofstd::threadbecause it has better exception-safety characteristicsboost::variant<std::string, std::exception_ptr>to pass the error on to the processing side of things (here the error is passed as an exception but you can use anerror_codeor anything you fancy). Not an exhaustive list by any means.