I read from CUDA by Example , chapter 9.4, that when using atomic operations

Question

0

Asked: June 7, 20262026-06-07T12:07:58+00:00 2026-06-07T12:07:58+00:00

I read from CUDA by Example , chapter 9.4, that when using atomic operations

0

I read from CUDA by Example, chapter 9.4, that when using atomic operations on GPU global memory improperly, performance of the program may be worse than that when executed purely on CPU, because of the memory access contention.

In the worse case, the program executed on GPU is highly serialized and no threads execute in parallel, which is just the way a single-threaded program run on the CPU. So the key problem is how fast the program accesses the memory.

Considering the example in the book I mentioned, it seems that CPU accesses host memory faster than GPU accesses global memory on device.

Is that so? Or are there any other factors that influence the performance of the program under the circumstance I just described?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T12:08:00+00:00

i think you’re misreading things slightly. yes, it’s saying that single-threaded code on the GPU is typically slower than on the CPU. but that’s not because of raw memory bandwidth – it’s because a CPU is much more powerful than a GPU when running a single thread. for example, a CPU has pipelining and sophisticated branch prediction to pre-load data from memory, while a GPU is designed to switch contexts to another thread when waiting for data. the CPU is tuned for the single threaded case while the GPU is tuned for many threads.

if you want to know which memory is fastest, look at the technical specs for your card and mobo, but that’s not really what the book is talking about.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I read from CUDA by Example , chapter 9.4, that when using atomic operations

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply