I am using openMP to parallelize a few statements. I am using the parallel for construct. The parallelized for loop looks like:
double solverFunction::apply(double* parameters_ , Model* varModel_)
{
double functionEvaluation = 0;
Command* command_ = 0;
Model* model_ = 0;
#pragma omp parallel for shared (functionEvaluation) private (model_,command_)
for (int i=rowStart;i<rowEnd+1;i++)
{
model_ = new Model(varModel_);
model_->addVariable("i", i);
model_->addVariable("j", 1);
command_ = formulaCommand->duplicate(model_);
functionEvaluation += command_->execute().toDouble();
}
}
It is workly on average. Execution time is dramatically reduced, and result is as expected. However, from time to time, especially for big problems (big number of iterations over i, big number of data to copy in copy constructor call
model_ = new Model(varModel_);
, others?), it crashed. Call stack ends in classes such as qAtomicBasic (it is a program written in C++/Qt), QHash, and I have an idea it crashes because of concurrent read/write access in memory.
HOWEVER, model_ and command_ are private, so that each thread deals with a copy of each. In the variable model_, I copy varModel_, so that the pointer passed in argument is not altered by the threads. Alike, command_ is a copy of the member variable formulaCommand (duplicate is roughtly a copy constructer).
The possible flaws in my code I identified are
-
functionEvaluation may be modified by several threads simultaneously
-
copy constructor in statement
model_ = new Model(varModel_);
reads the members for varModel_ in memory to construct the new (model_) instance. Concurrent access to varModel_ data members could occur, althought this not about altering their value here, but only reading them (affecting them to other variables).
Also, I see two improvements only (which I cannot test until a few days, but I ask for advice anyway):
-
add atomic clause, so that functionEvalution is not concurrently written in
-
add operator reduction(+,functionEvaluation), so that concurrency regarding access to functionEvaluation is dealt with automatically
Do these solutions seem to accuratly solve the problem and which is more efficient in general? Where does the problem can lie with the code I wrote? What are solutions?
Thanks a lot!
The first observation is that, as you’ve noticed yourself, modifying
functionEvaluationconcurrently is a bad idea. It will fail.The read-only access of
varModel_, on the other hand, is not a problem. Neither is the copy constructor call (but where is it? Your code doesn’t show it).Unrelatedly, using the
privateclause in C++ is a bad idea. Just declare the thread-private variables inside the parallel block (in this case, theforloop).I also don’t see why you are using pointers here. Their use doesn’t make immediate sense – use stack-allocated objects instead.
The following modified code should work (I’ve also taken the liberty of unifying the coding style … why the trailing underscores?):
Note that, due to inherent floating point inaccuracies, this code may yield different results from the sequential code. This is unavoidable.