If multiple threads are simultaneously writing a single memory location.,there will be a race

Question

0

Asked: June 18, 20262026-06-18T07:26:38+00:00 2026-06-18T07:26:38+00:00

If multiple threads are simultaneously writing a single memory location.,there will be a race

0

If multiple threads are simultaneously writing a single memory location.,there will be a race condition,right??
In my case same is happening..

Consider a module from ‘reduce.cl’

int i = get_global_id(0);
int n,j;

n = keyMobj[i];                       // this n is the key..It can be either 0 or 1.
for(j=0; j<2; j++)
      sumMobj[n*2+j] += dataMobj[i].dattr[j];        //summing operation.

Here, The memory locations
sumMobj===> […0…, ….1…] is accessed 4 threads simultaneously &
sumMobj===> [….3…, ….4…] is accessed 6 threads simultaneously..

Is there any way to still make it parallely,like using locking or semaphore? As this summing is a very big part in my algorithm…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T07:26:39+00:00

I can give you some hint as I was also facing similar problem.

I can think of three different methods for achieving similar goal:

Consider a simple kernel, assuming you launched 4 (0-3) threads

_kernel void addition (int *p)
{
int i = get_local_id(0);
     p[4]+= p[i];
}

You want to add values p[0], p[1], p[2], p[3], p[4], and store the final sum in p[4]. right? i.e:

p[4]= p[0] + p[1] + p[2] + p[3] + p[4]

Method -1 (no parallelism)

Assign this job to only 1 thread (no parallelism):

int i = get_local_id(0);
if (i==0)

{

p[4]+= p[i];

}

Method-2 (with parallelism)

Express your problem as follows:

p[4]= p[0] + p[1] + p[2] + p[3] + p[4] + 0

This is a reduction problem

So launch 3 threads: i=0 to i=2. In first iteration

 i=0 finds p[0] + p[1]
 i=1 finds p[2] + p[3]  
 i=2 finds p[4] + 0

Now you have three numbers, you apply the same logic as above and add these numbers (with suitable padding of 0 to make it in power of two)

Method -3 Atomic operations

If you still need to implement this atomically, you can use atomic_add():

  int fsfunc atomic_add (   volatile __global int *p ,int val)

Description

Read the 32-bit value (referred to as old) stored at location pointed
by p. Compute (old + val) and store result at location pointed by p.
The function returns old.

This is assuming the data is int type. Otherwise you can see the link as suggested above.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

If multiple threads are simultaneously writing a single memory location.,there will be a race

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply