I wrote a kernel for OpenCL where I initialise all the elements of a

Question

0

Asked: June 15, 20262026-06-15T06:13:07+00:00 2026-06-15T06:13:07+00:00

I wrote a kernel for OpenCL where I initialise all the elements of a

0

I wrote a kernel for OpenCL where I initialise all the elements of a 3D array to -> i*i*i + j*j*j. I’m now having problems in creating a grid of threads to do the initialisation of the elements (concurrently). I know that the code that I have now only uses 3 threads, how can I expand on that?

Please help. I’m new to OpenCL, so any suggestion or explanation might be handy. Thanks!

This is code:

_kernel void initialize (
int X;
int Y;
int Z;
_global float*A) {

// Get global position in X direction
int dirX = get_global_id(0);
// Get global position in Y direction
int dirY = get_global_id(1);
// Get global position in Z direction
int dirZ = get_global_id(2);

int A[2000][100][4];
int i,j,k;
for (i=0;i<2000;i++)
{
    for (j=0;j<100;j++)
    {
        for (k=0;k<4;k++)
        {
            A[dirX*X+i][dirY*Y+j][dirZ*Z+k] = i*i*i + j*j*j;
        }
    }
}
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T06:13:08+00:00

You create the buffer to store your output ‘A’ in the calling (host) code. This is passed to your kernel as a pointer, which is correct in your function definition above. However you don’t need to declare it again inside your kernel function, so remove the line int A[2000][100][4];.

You can simplify the code greatly. Using the 3D global ID to indicate the 3D index into the array for each work-item, you could change the loop as follows (assuming that for a given i and j, all elements along Z should have the same value):

__kernel void initialize (__global float* A) {
  // cast required so that kernel compiler knows the array dimensions
  __global float (*a)[2000][100][4] = A;

  // Get global position in X direction
  int i = get_global_id(0);
  // Get global position in Y direction
  int j = get_global_id(1);
  // Get global position in Z direction
  int k = get_global_id(2);

  (*a)[i][j][k] = i*i*i + j*j*j;
}

In your calling code you would then create the kernel with a global work-size of 2000x100x4.

Practically this is a lot of work items to schedule, so you would likely get better performance from a global (one-dimensional) work-size of 2000 and a loop inside the kernel, e.g.:

__kernel void initialize (__global float* A) {
  // cast required so that kernel compiler knows the array dimensions
  __global float (*a)[2000][100][4] = A;

  // Get global position in X direction
  int i = get_global_id(0);

  for (j=0;j<100;j++) {
    for (k=0;k<4;k++) {
      (*a)[i][j][k] = i*i*i + j*j*j;
    }
  }
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I wrote a kernel for OpenCL where I initialise all the elements of a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply