A follow up Q from: CUDA: Calling a device function from a kernel I’m

Question

0

Editorial Team

Asked: June 8, 20262026-06-08T17:33:41+00:00 2026-06-08T17:33:41+00:00

A follow up Q from: CUDA: Calling a device function from a kernel I’m

0

A follow up Q from: CUDA: Calling a __device__ function from a kernel

I’m trying to speed up a sort operation. A simplified pseudo version follows:

// some costly swap operation
__device__ swap(float* ptrA, float* ptrB){
  float saveData;         // swap some 
  saveData= *Adata;       //   big complex
  *Adata= *Bdata          //     data chunk
  *Bdata= saveData;
}

// a rather simple sort operation
__global__ sort(float data[]){
  for (i=0; i<limit: i++){
  find left swap point
  find right swap point
  swap<<<1,1>>>(left, right);
  }
}

(Note: This simple version doesn’t show the reduction techniques in the blocks.)
The idea is that it is easy (fast) to identify the swap points. The swap operation is costly (slow). So use one block to find/identify the swap points. Use other blocks to do the swap operations. i.e. Do the actual swapping in parallel.
This sounds like a decent plan. But if the compiler in-lines the device calls, then there is no parallel swapping taking place.
Is there a way to tell the compiler to NOT in-line a device call?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T17:33:43+00:00

Editorial Team

2026-06-08T17:33:43+00:00Added an answer on June 8, 2026 at 5:33 pm

Edit (2016):

Dynamic parallelism was introduced in the second generation of Kepler architecture GPUs. Launching kernels in the device is supported on compute capability 3.5 and higher devices.

Original Answer:

You will have to wait until the end of the year when the next generation of hardware is available. No current CUDA devices can launch kernels from other kernels – it is presently unsupported.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

A follow up Q from: CUDA: Calling a __device__ function from a kernel I’m

Leave an answerCancel reply

1 Answer

A follow up Q from: CUDA: Calling a device function from a kernel I’m

Leave an answer
Cancel reply