Pinned or page-locked memory is transferred faster to GPUs compared to not-locked memory.
CUDA provides the cudaHostAlloc
and cudaHostRegister
calls to allocate or register page-locked memory. The Nvidia driver
then checks upon a memory transfer if the host memory is locked and
issues according copy code paths.
Is it possible to page-lock memory with the system call mlock()
achieving exactly the same effect (regards to transfer speeds) as cudaHostRegister
? Or does the according CUDA call update an internal database which the driver queries?