what is better? I need to process data in several steps and it appears

Question

0

Asked: June 3, 20262026-06-03T05:19:05+00:00 2026-06-03T05:19:05+00:00

what is better? I need to process data in several steps and it appears

0

what is better? I need to process data in several steps and it appears to me that I’ve 2 options:
1) use one big kernel
2) use streams with one kernel for each step

There is some latency before a kernel is executed, but does it really matter in this case? Is latency for a big kernel same as sum of latencies for several smaller kernels?

Are there any advantages one way compared to the other one?

Thanks guys.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T05:19:07+00:00

Launch latency for a kernel on a Fermi card is on the order of 10us, so nothing to worry about. It makes sense — to render a scene in a game, one has to run many different shaders (which are kernels).

A kernel has to read the data that it will process from global memory and write the results back to global memory. So each separate kernel implies that full read/write cycle. You may be able to speed things up if you are able to chain multiple steps together in a big kernel, still bracketed by a single read/write cycle.

As an example, if you need to perform operations A, B and C, chaining them might give you READ – A – B – C – WRITE while separate kernels would give you READ – A – WRITE – READ – B – WRITE – READ – C – WRITE.

Remember, even if you run even a single kernel, you can still keep your code readable by breaking the separate steps out to separate device functions.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

what is better? I need to process data in several steps and it appears

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply