Pinned or page-locked memory is transferred faster to GPUs compared to not-locked memory.
CUDA provides the cudaHostAlloc and cudaHostRegister calls to allocate or register page-locked memory. The Nvidia driver then checks upon a memory transfer if the host memory is locked and issues according copy code paths.
Is it possible to page-lock memory with the system call mlock() achieving exactly the same effect (regards to transfer speeds) as cudaHostRegister ? Or does the according CUDA call update an internal database which the driver queries?
I think the NVIDIA driver maintains its own page-locked memory accessible via
cudaHostAllocetc. The system callmlockuses the kernel locking which is technically equivalent to what the driver does, but the kernel page-locking is very resource restrictedRLIMIT_MEMLOCKwhich is very small. Thus NVIDIA driver uses its own page-locking mechanism. And they warn about excessive usage, since it steals lots of memory accessible to the rest of the kernel.So,
cudaHostRegisteris equivalent tomlock()in the sense that it page-locks memory, but not in the sense that it is bound to resource limitations. And not in the sense, thatcudaMemcpyis accelerated.