Pinned or page-locked memory is transferred faster to GPUs compared to not-locked memory. CUDA

Question

0

Asked: June 4, 20262026-06-04T08:13:35+00:00 2026-06-04T08:13:35+00:00

Pinned or page-locked memory is transferred faster to GPUs compared to not-locked memory. CUDA

0

Pinned or page-locked memory is transferred faster to GPUs compared to not-locked memory.
CUDA provides the cudaHostAlloc and cudaHostRegister calls to allocate or register page-locked memory. The Nvidia driver then checks upon a memory transfer if the host memory is locked and issues according copy code paths.

Is it possible to page-lock memory with the system call mlock() achieving exactly the same effect (regards to transfer speeds) as cudaHostRegister ? Or does the according CUDA call update an internal database which the driver queries?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T08:13:36+00:00

I think the NVIDIA driver maintains its own page-locked memory accessible via cudaHostAlloc etc. The system call mlock uses the kernel locking which is technically equivalent to what the driver does, but the kernel page-locking is very resource restricted RLIMIT_MEMLOCK which is very small. Thus NVIDIA driver uses its own page-locking mechanism. And they warn about excessive usage, since it steals lots of memory accessible to the rest of the kernel.

So, cudaHostRegister is equivalent to mlock() in the sense that it page-locks memory, but not in the sense that it is bound to resource limitations. And not in the sense, that cudaMemcpy is accelerated.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Pinned or page-locked memory is transferred faster to GPUs compared to not-locked memory. CUDA

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply