When creating a CUDA event, you can optionally turn on the cudaEventBlockingSync flag. But – what if the difference between creating an event with or without the flag? I read the fine manual; it just doesn’t make sense to me. What is the “calling host thread”, and what “blocks” when you don’t use the flag?
4.6.2.7 cudaError_t cudaEventSynchronize(cudaEvent_t event)
Blocks until the event has actually
been recorded. … Waiting for an
event that was created with the
cudaEventBlockingSync flag will cause
the calling host thread to block until
the event has actually been recorded.
when you call that function, the thread will stop executing until that event happens, at which time the program continues. It is a way of making sure you know the state of the running program. This is especially important in CUDA because so many things are asynchronous.
The “calling host thread” is the thread that is running on the CPU of the host computer in which the CUDA device resides.
edit in response to comment below:
I believe that the difference between a “blocking sync” and a regular sync is that the thread blocks and will not run until the event is completed, as opposed to a thread that “spins” as it waits, constantly checking the value. This means that the thread will not use any extra CPU time spinning, but will instead be awakened once the event is completed. This is useful if, say, you’re running this program on a server where CPU time is at a premium or you have to pay per unit time.