I’ve a vector on the host and I want to halve it and send

Question

0

Asked: May 30, 20262026-05-30T23:40:50+00:00 2026-05-30T23:40:50+00:00

I’ve a vector on the host and I want to halve it and send

0

I’ve a vector on the host and I want to halve it and send to the device. Doing a benchmark shows that CL_MEM_ALLOC_HOST_PTR is faster than CL_MEM_USE_HOST_PTR and much faster than CL_MEM_COPY_HOST_PTR. Also memory analysis on device doesn’t show any difference in the buffer size created on device. This differs from the documentation of the mentioned flag on Khronos- clCreateBuffer. Does anyone know what’s going on?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T23:40:52+00:00

First off and if I understand you correctly, clCreateSubBuffer is probably not what you want, as it creates a sub-buffer from an existing OpenCL buffer object. The documentation you linked also tells us that:

The CL_MEM_USE_HOST_PTR, CL_MEM_ALLOC_HOST_PTR and CL_MEM_COPY_HOST_PTR values cannot be specified in flags but are inherited from the corresponding memory access qualifiers associated with buffer.

You said you have a vector on the host and want to send half of it to the device. For this, I would use a regular buffer of half the vector’s size (in bytes) on the device.

Then, with a regular buffer, the performance you see is expected.

CL_MEM_ALLOC_HOST_PTR only allocates memory on the host, which does not incur any transfer at all: it is like doing a malloc and not filling the memory.
CL_MEM_COPY_HOST_PTR will allocate a buffer on the device, most probably the RAM on GPUs, and then copy your whole host buffer over to the device memory.
On GPUs, CL_MEM_USE_HOST_PTR most likely allocates so-called page-locked or pinned memory. This kind of memory is the fastest for host->GPU memory transfer and this is the recommended way to do the copy.

To read how to correctly use pinned memory on NVidia devices, refer to chapter 3.1.1 of NVidia’s OpenCL best practices guide. Note that if you use too much pinned memory, performance may drop below a host copied memory.

The reason why pinned memory is faster than copied device memory is well-explained in this SO question aswell as this forum thread it points to.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve a vector on the host and I want to halve it and send

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply