I’m having trouble doing the parallelization on an array of numbers with CUDA. So,

Question

0

Editorial Team

Asked: June 2, 20262026-06-02T16:02:26+00:00 2026-06-02T16:02:26+00:00

I’m having trouble doing the parallelization on an array of numbers with CUDA. So,

0

I’m having trouble doing the parallelization on an array of numbers with CUDA.

So, for example if we have an array M containing numbers ( 1 , 2 , 3 , 4 , 5)

And If I were to remove the number 2 in the array and shift everything to the left,
the resulting array would be ( 1 , 3 , 4 , 5 , 5 )

where M[1] = M[2], M[2] = M[3] , M[3] = M[4]

And my question is how can we do this in parallel in cuda? Because when we parallel this
there might be a race condition where the number 2 (M[1]) might not be the first one to
act first, if M[2] were the first one to shift, the resulting array would become
( 1 , 4 , 4 , 5 , 5). Is there any method to handle this? I’m fairly new to cuda so I’m
not sure what to do…

My current code is as follows:

__global__ void gpu_shiftSeam(int *MCEnergyMat, int *seam, int width, int height, int currRow)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    int j = blockIdx.y * blockDim.y + threadIdx.y;

    int index = i+width*j;

    if(i < width && j <height)
    {
        //shift values of -1 to the side of the image
        if(MCEnergyMat[i+width*j] == -1)
        {
            if(i+1 != width)
                    MCEnergyMat[index] = MCEnergyMat[index+1];
        }
        if(seam[j] < i)
        {
            if(i+1 != width)
                MCEnergyMat[index] = MCEnergyMat[index+1];
        }
    }
}

Where seam[i] contains the index I would like to remove in the array. and MCEnergyMat is just a 1D array converted from a 2d array… However, my code does not work… and I believe race condition is the problem.

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-02T16:02:27+00:00

As talonmies notes in his comment, this sort of thing is called “stream compaction”. Here’s how you would do it with Thrust:

#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/remove.h>
#include <iostream>

int main()
{
  int data[5] = {1,2,3,4,5};
  thrust::device_vector<int> d_vec(data, data + 5);

  // new_end points to the end of the sequence after 2 has been thrown out
  thrust::device_vector<int>::iterator new_end = 
    thrust::remove(d_vec.begin(), d_vec.end(), 2);

  // erase everything after the new end
  d_vec.erase(new_end, d_vec.end());

  // prove that it worked
  thrust::host_vector<int> h_vec = d_vec;

  std::cout << "result: ";
  thrust::copy(h_vec.begin(), h_vec.end(), std::ostream_iterator<int>(std::cout, " "));
  std::cout << std::endl;

  return 0;
}

Here’s the result:

$ nvcc test.cu -run result: 1 3 4 5

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m having trouble doing the parallelization on an array of numbers with CUDA. So,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply