I am trying to implement elementwise multiply with pyopencl, but when I read the

Question

0

Asked: May 22, 20262026-05-22T12:22:29+00:00 2026-05-22T12:22:29+00:00

I am trying to implement elementwise multiply with pyopencl, but when I read the

0

I am trying to implement elementwise multiply with pyopencl, but when I read the result buffer from pyopencl, only the first 3 of 8 rows are correct. I am not sure if it is a problem with OpenCL or pyopencl. Here is my minimal example with output. I am happy for every suggestion.

Thanks

import pyopencl as cl
import numpy

# OpenCL Kernel code -----------------------------------------------------
KERNEL_CODE = """
  __kernel void eMul(
            __global float* C,
            __global float* A,
            __global float* B,
             int width, int height)
  {
      // ID
      int x = get_global_id(0);
      int y = get_global_id(1);

      // Multiplying
     C[y * width + x ] = A[y * width + x] * B[y * width + x];
  }
"""

# init OpenCL -----------------------------------------------------
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
prg = cl.Program(ctx, KERNEL_CODE).build()
kernel = prg.eMul

# init host memory -----------------------------------------------------
numpy.random.seed(42)
width = 4
height = 8
cl_left= numpy.random.rand(height, width).astype(numpy.float32) * 10
cl_left = cl_left.round()
cl_right= numpy.random.rand(height, width).astype(numpy.float32) * 10
cl_right = cl_right.round()
print "\nleft\n",cl_left,"\n\nright\n",cl_right

# transfer host -> device -----------------------------------------------------
mf = cl.mem_flags

cl_result = numpy.zeros(cl_left.shape).astype(numpy.float32)
d_a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=cl_left)
d_b_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=cl_right)
d_c_buf = cl.Buffer(ctx, mf.WRITE_ONLY, cl_result.nbytes)

kernel.set_arg(0,d_c_buf)
kernel.set_arg(1,d_a_buf)
kernel.set_arg(2,d_b_buf)
kernel.set_arg(3,numpy.uint32(width))
kernel.set_arg(4,numpy.uint32(height))

event = cl.enqueue_nd_range_kernel(queue,kernel,cl_result.shape,cl_result.shape)
event.wait()

# transfer device -> host -----------------------------------------------------
cl.enqueue_read_buffer(queue, d_c_buf, cl_result).wait()
print "\nresult\n", cl_result

Output:

left
[[  4.  10.   7.   6.]
 [  2.   2.   1.   9.]
 [  6.   7.   0.  10.]
 [  8.   2.   2.   2.]
 [  3.   5.   4.   3.]
 [  6.   1.   3.   4.]
 [  5.   8.   2.   5.]
 [  6.   0.   6.   2.]] 
right
[[  1.   9.  10.   8.]
 [  3.   1.   7.   4.]
 [  1.   5.   0.   9.]
 [  3.   7.   3.   5.]
 [  5.   2.  10.   8.]
 [  9.   9.   6.   9.]
 [  1.   2.   0.   3.]
 [  4.   3.   8.   4.]]

result
[[   4.   90.   70.   48.]
 [   6.    2.    7.   36.]
 [   6.   35.    0.   90.]
 [  24.   14.    6.   10.]
 [  15.   10.   40.   24.] <== till here correct
 [ 138.   69.   87.   35.] <== from here incorrect
 [ 130.   47.  109.   49.]
 [  95.   45.   25.   49.]]

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T12:22:30+00:00

There looks to be a bit of confusion about how you are specifying the shape of the arrays to the kernel – basically you have width and height reversed compared to how the source numpy arrays have been sized. As a result you are trying to write in column major order using a pitch of 4 words into the output array, rather than 8.

If you replace the kernel with this:

   __kernel void eMul(
                       __global float* C,
                       __global float* A,
                       __global float* B,
                        int width, int height)
    {
        // ID
        int x = get_global_id(0);
        int y = get_global_id(1);

        // Multiplying
        C[y * height + x ] = A[y * height + x] * B[y * height + x];
    }

I think you will find the results are more in line with what you were expecting.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to implement elementwise multiply with pyopencl, but when I read the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply