I would like to parallelise a linear operation (fitting a complicated mathematical function to some dataset) with multiple processors.
Assume I have 8 cores in my machine, and I want to fit 1000 datasets. What I expect is some system that takes the 1000 datasets as a queue, and sends them to the 8 cores for processing, so it starts by taking the first 8 from the 1000 as FIFO. The fitting times of each dataset is in general different than the other, so some of the 8 datasets being fitted could take longer than the others. What I want from the system is to save the results of the fitted data sets, and then resume taking new datasets from the big queue (1000 datasets) for each thread that is done. This has to resume till the whole 1000 datasets is processed. And then I could move on with my program.
What is such a system called? and are there models for that on C++?
I parallelise with OpenMP, and use advanced C++ techniques like templates and polymorphism.
Thank you for any efforts.
You can either use OpenMP parallel for with dynamic schedule or OpenMP tasks. Both could be used to parallelise cases where each iteration takes different amount of time to complete. With dynamically scheduled for:
schedule(dynamic,1)makes each thread execute one iteration at a time and threads are never left idle unless there are no more iterations to process.With tasks:
Here one of the threads runs a for-loop which produces 1000 OpenMP tasks. OMP tasks are kept in a queue and processed by idle threads. It works somewhat similar to dynamic for-loops but allows for greater freedom in the code constructs (e.g. with tasks you can parallelise recursive algorithms). The
taskwaitconstruct waits for all pending tasks to be done. It is implied at the end of the parallel region so it is really necessary only if more code follows before the end of the parallel region.In both cases each invocation to
fit()will be done in a different thread. You have to make sure that fitting one set of parameters does not affect fitting other sets, e.g. thatfit()is a thread-safe method/function. Both cases also require that the time to executefit()is much higher than the overhead of the OpenMP constructs.OpenMP tasking requires OpenMP 3.0 compliant compiler. This rules out all versions of MS VC++ (even the one in VS2012), should you happen to develop on Windows.
If you’d like to have only one instance of fitter ever initialised per thread, then you should take somewhat different approach, e.g. make the fitter object global and
threadprivate:Here
fitteris a global instance of theFitterclass. Theomp threadprivatedirective instructs the compiler to put it in the Thread-Local Storage, e.g. to make it per-thread global variable. These persists between the different parallel regions. You can also useomp threadprivateonstaticlocal variables. These too persist between the different parallel regions (but only in the same function):The
omp_set_dynamic(0)call disables dynamic teams, i.e. each parallel region will always execute with as many threads as specified by theOMP_NUM_THREADSenvironment variable.