I am creating cudaStream in a host function
void callKernel(cudaStream_t* ptrStream)
{
kernelDoesNotMatter<<<1,12,0,*ptrStream>>>();
//Here i am not calling cudaStreamSynchronize
}
void host_func()
{
cudaStream_t stream;
cudaStreamCreate(&stream);
callKernel(&stream);
cudaError_t err = cudaStreamQuery(stream) //err == cudaSuccess?
}
Over here I am not calling cudaStreamSynchronize() after calling kernel in callKernel method why does cudaStreamQuery return cudaSuccess? Is it because we cannot pass the reference of cudaStream_t to another function? Am I missing something in this?
Thanks.
cudaStreamQuery()returnscudaSuccessif all commands on the stream have completed. This means that in your example, it returnscudaSuccessbecause the kernel has already completed.The purpose of
cudaStreamQuery()is to allow you to write code that does other things on the host thread while waiting for the stream to complete. You can do that with something like this:Note this is not an idle wait loop.
If you want the semantics of an idle wait loop, rather than having an empty
whileblock, it’s better to use eithercudaStreamSynchronize()or use acudaEventandcudaStreamWaitEvent(). The latter gives you more flexibility since you can wait on a specific event recorded (cudaEventRecord()) after a specific kernel or other call on the specified stream.