The following snippet is code from water-nsq benchmark from SPLASH 2…
if (comp_last > NMOL1)
{
for (mol = StartMol[ProcID]; mol < NMOL; mol++)
{
pthread_mutex_lock(&gl->MolLock[mol % MAXLCKS]);
for ( dir = XDIR; dir <= ZDIR; dir++) {
temp_p = VAR[mol].F[DEST][dir];
temp_p[H1] += PFORCES[ProcID][mol][dir][H1];
temp_p[O] += PFORCES[ProcID][mol][dir][O];
temp_p[H2] += PFORCES[ProcID][mol][dir][H2];
}
pthread_mutex_unlock(&gl->MolLock[mol % MAXLCKS]);
}
comp = comp_last % NMOL;
for (mol = 0; ((mol <= comp) && (mol < StartMol[ProcID])); mol++)
{
pthread_mutex_lock(&gl->MolLock[mol % MAXLCKS]);
for ( dir = XDIR; dir <= ZDIR; dir++)
{
temp_p = VAR[mol].F[DEST][dir];
temp_p[H1] += PFORCES[ProcID][mol][dir][H1];
temp_p[O] += PFORCES[ProcID][mol][dir][O];
temp_p[H2] += PFORCES[ProcID][mol][dir][H2];
}
pthread_mutex_unlock(&gl->MolLock[mol % MAXLCKS]);
}
}
else
{
for (mol = StartMol[ProcID]; mol <= comp_last; mol++)
{
pthread_mutex_lock(&gl->MolLock[mol % MAXLCKS]);
for ( dir = XDIR; dir <= ZDIR; dir++)
{
temp_p = VAR[mol].F[DEST][dir];
temp_p[H1] += PFORCES[ProcID][mol][dir][H1];
temp_p[O] += PFORCES[ProcID][mol][dir][O];
temp_p[H2] += PFORCES[ProcID][mol][dir][H2];
}
pthread_mutex_unlock(&gl->MolLock[mol % MAXLCKS]);
}
}
pthread_barrier_wait(&(gl->start));
The problem is that it is not deterministic at the barrier in the end, that is, if you execute this code two times with same inputs, it gives different answers. In other words, if the lock order of mutexes is changed, the results are different.
And yes I have verified this by noting the memory pages. Also I can assure you that the change occurs in the VAR‘s (pointed by temp_p) memory.
I want to know why? Because apparently, all threads are putting their own values (PFORCES[ProcID]…) to the sum of temp_p and at the end, that is at the barrier, the results should be same, no matter the order in which threads acquired the locks.
[EDITED]
Also, please note that variables comp, dir and mol are all local variables of the thread and therefore not shared.
Second try.
I can’t check it, but I assume that in
temp_p[H1] += PFORCES[ProcID][mol][dir][H1];you are adding doubles or floats.For floating point types, the order of addition matters! Floating point addition is not associative!
A different thread order means a different addition order. So changes in the outcome are to be expected.
See http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems for some explanation.