I have a convolution kernel with CUDA which is called very often (it is used for a real time rendering). Should I cudaMalloc and cudaFree each time I want to call the kernel? I tried to store a pointer to the cudaMalloc result and proceed by just cudaMemcpy’ing things before the kernel execution but I experienced weird behavior (like empty memory after the kernel execution)
I was also thinking about using pinned memory but if I have to allocate and free it every time it could even slow the application down. How should I proceed for a kernel which gets called very often?
It sounds like what you’re doing should work.
Maybe you have a bug in your kernel. Try adding cudaThreadSynchronize and cudaGetLastError calls after the kernel launches to debug.
Without more information, I can’t offer you any more advice than that.