The following algorithm is run iteratively in my program. running it, without the two

Question

0

Asked: May 27, 20262026-05-27T06:09:47+00:00 2026-05-27T06:09:47+00:00

The following algorithm is run iteratively in my program. running it, without the two

0

The following algorithm is run iteratively in my program. running it, without the two lines indicated below, takes 1.5X as long as without. That is very surprising to me as it is. Worse, however, is that running with those two lines increases completion to 4.4X of running without them (6.6X not running whole algorithm). Additionaly, it causes my program to fail to scale beyond ~8 cores. In fact, when run on a single core, the two lines only increase time to 1.7x, which is still way too high considering what they do. I’ve ruled out that it has to do with an effect of the modified data elsewhere in my program.

So I’m wondering what could be causing this. Something to do with the cache maybe?

void NetClass::Age_Increment(vector <synapse> & synapses, int k)  
{
    int size = synapses.size();
    int target = -1;

    if(k > -1)
    {
        for(int q=0, x=0 ; q < size; q++)
        {
            if(synapses[q].active)
                synapses[q].age++;
            else
            {
                if(x==k)target=q;
                x++;
            }
        }
        /////////////////////////////////////Causing Bottleneck/////////////
        synapses[target].active = true;
        synapses[target].weight = .04 + (float (rand_r(seedp) % 17) / 100);
        ////////////////////////////////////////////////////////////////////
    }

    else
    {
        for(int q=0 ; q < size; q++)
            if(synapses[q].active)
                synapses[q].age++;
    }
}

Update: Changing the two problem lines to:

bool x = true;
float y = .04 + (float (rand_r(seedp) % 17) / 100);

Removes the problem. Suggesting maybe that it’s something to do with memory access?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T06:09:48+00:00

Each thread modifies memory all the other reads read:

for(int q=0, x=0 ; q < size; q++)
   if(synapses[q].active) ... // ALL threads read EVERY synapse.active
...
synapses[target].active = true; // EVERY thread writes at leas one synapse.active

These kind of reads and writes on the same address from different threads cause a great deal of cache invalidation, which will result in exactly the symptoms you describe. The solution is to avoid the write inside the loop, and the fact that moving the write into local variables is, again, proof that the problem is cache invalidation. Note that even if you wouldn’t write the sane field being read (active), you would likely see the same symptoms due to false sharing, as I suspect that active, age and weight share a cache line.

For more details see CPU Caches and Why You Care

A final note is that the assignment to active and weight, not to mention the age++increment all seem extremely thread unsafe. Interlocked operations or lock/mutex protection for such updates would be mandatory.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

The following algorithm is run iteratively in my program. running it, without the two

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply