The following algorithm is run iteratively in my program. running it, without the two lines indicated below, takes 1.5X as long as without. That is very surprising to me as it is. Worse, however, is that running with those two lines increases completion to 4.4X of running without them (6.6X not running whole algorithm). Additionaly, it causes my program to fail to scale beyond ~8 cores. In fact, when run on a single core, the two lines only increase time to 1.7x, which is still way too high considering what they do. I’ve ruled out that it has to do with an effect of the modified data elsewhere in my program.
So I’m wondering what could be causing this. Something to do with the cache maybe?
void NetClass::Age_Increment(vector <synapse> & synapses, int k)
{
int size = synapses.size();
int target = -1;
if(k > -1)
{
for(int q=0, x=0 ; q < size; q++)
{
if(synapses[q].active)
synapses[q].age++;
else
{
if(x==k)target=q;
x++;
}
}
/////////////////////////////////////Causing Bottleneck/////////////
synapses[target].active = true;
synapses[target].weight = .04 + (float (rand_r(seedp) % 17) / 100);
////////////////////////////////////////////////////////////////////
}
else
{
for(int q=0 ; q < size; q++)
if(synapses[q].active)
synapses[q].age++;
}
}
Update: Changing the two problem lines to:
bool x = true;
float y = .04 + (float (rand_r(seedp) % 17) / 100);
Removes the problem. Suggesting maybe that it’s something to do with memory access?
Each thread modifies memory all the other reads read:
These kind of reads and writes on the same address from different threads cause a great deal of cache invalidation, which will result in exactly the symptoms you describe. The solution is to avoid the write inside the loop, and the fact that moving the write into local variables is, again, proof that the problem is cache invalidation. Note that even if you wouldn’t write the sane field being read (
active), you would likely see the same symptoms due to false sharing, as I suspect thatactive,ageandweightshare a cache line.For more details see CPU Caches and Why You Care
A final note is that the assignment to
activeandweight, not to mention theage++increment all seem extremely thread unsafe. Interlocked operations or lock/mutex protection for such updates would be mandatory.