Given a float[n] inputdata to pass to the OpenCL kernel, could anyone enlighten me on the difference between the following three ways to pass this to the kernel:
A)
cl_mem input = clCreateBuffer(context, CL_MEM_USE_HOST_PTR Sizeof.cl_float * n,
inputdata, NULL);
clSetKernelArg(kernel, i, Sizeof.cl_mem, Pointer.to(input));
B)
clSetKernelArg(kernel, i, Sizeof.cl_float * n, Pointer.to(inputdata));
C)
cl_mem input = clCreateBuffer(context, CL_MEM_options_here, Sizeof.cl_float * n,
NULL, NULL);
clEnqueueWriteBuffer(command_queue, input, CL_TRUE, 0, Sizeof.cl_float * n,
inputdata, 0, NULL, NULL);
clSetKernelArg(kernel, i, Sizeof.cl_mem, Pointer.to(input));
?
Have I understood correct that the difference between A) and C) is that C) copies the entire array once at the start and then works on-GPU with it, while A) has to load its data on-the-fly? So A) is good if one needs only a small portion of an array, and C) is the way to go if you use the entire array anyway?
And what about B)? Is it more like A), more like C), or still something different?
Yes, you cannot pass huge amounts of parameters. There is an upper limit on the size of all parameters (typically in the 50 KiB range — you can query it with
clGetDeviceInfoandCL_DEVICE_MAX_PARAMETER_SIZE). With Method a and c you can pass much larger buffers (hundreds of megabytes.) a is not helpful for OpenCL 1.1 and lower, as the buffer will be still copied usually, but with OpenCL 1.2, you can avoid one copy if your host and device is the same (i.e. you are running a CPU OpenCL runtime for instance.)