This question is related to: Does Nvidia Cuda warp Scheduler yield? However, my question

Question

0

Asked: May 28, 20262026-05-28T01:20:00+00:00 2026-05-28T01:20:00+00:00

This question is related to: Does Nvidia Cuda warp Scheduler yield? However, my question

0

This question is related to: Does Nvidia Cuda warp Scheduler yield?

However, my question is about forcing a thread block to yield by doing some controlled memory operation (which is heavy enough to make the thread block yield). The idea is to allow another ready-state thread block to execute on the now vacant multiprocessor.

The PTX manual v2.3 mentions (section 6.6):

…Much of the delay to memory can be hidden in a number of ways. The first is to have multiple threads of execution so that the hardware can issue a memory operation and then switch to other execution. Another way to hide latency is to issue the load instructions as early as possible, as execution is not blocked until the desired result is used in a subsequent (in time) instruction…

So it sounds like this can be achieved (despite being an ugly hack). Has anyone tried something similar? Perhaps with block_size = warp_size kind of setting?

EDIT: I’ve raised this question without clearly understanding the difference between resident and non-resident (but assigned to the same SM) thread blocks. So, the question should be about switching between two resident (warp-sized) thread blocks. Apologies!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T01:20:00+00:00

In the CUDA programming model as it stands today, once a thread block starts running on a multiprocessor, it runs to completion, occupying resources until it completes. There is no way for a thread block to yield its resources other than returning from the global function that it is executing.

Multiprocessors will switch among warps of all resident thread blocks automatically, so thread blocks can “yield” to other resident thread blocks. But a thread block can’t yield to a non-resident thread block without exiting — which means it can’t resume.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This question is related to: Does Nvidia Cuda warp Scheduler yield? However, my question

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply