Why is the GPU more performant in numeric calculations than the CPU? And worse

Question

0

Asked: June 11, 20262026-06-11T16:25:53+00:00 2026-06-11T16:25:53+00:00

Why is the GPU more performant in numeric calculations than the CPU? And worse

0

Why is the GPU more performant in numeric calculations than the CPU? And worse at branching? Can someone give me a detailed explanation of it?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T16:25:55+00:00

Each SM in GPU is an SIMD processor executing different threads of the warp on each lane of SIMD. Once application is more computation-bound (a few memory accesses) and no branch application achieves the peak FLOPS of GPU. This is due to the fact that upon branch, GPUs mask the one side of divergence and executes the other one first. Both paths are executed serially leaving some SIMD lanes inactive which accordingly drops performance.

I’ve included a useful Figure from Fung’s paper which is publicly available at the mentioned reference to show how performance actually drops: enter image description here

Figure (a) shows a typical branch divergence in GPUs occurred inside a warp (4 threads in this sample). Suppose you have following kernel code:

A:  // some computation
    if(X){
B:      // some computation
        if(Y){
C:          // some computation
        }
        else{
D:          // some computation
        }
E:      // some computation
    }else{
F:      // some computation
    }
G:  // some computation

Threads at A diverge into B and F. As shown in (b) some of the SIMD lanes are disabled over the time dropping performance. Figure (c) to (e) show how hardware serially executes diverging paths and manages divergence. For more information refer to this useful paper which is great starting point.

Compute-bounded applications like matrix multiply or N-Body simulation well mapped to GPUs and return very high performance. This is due to the fact they well occupy SIMD lanes, follow streaming model, and have a few memory accesses.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Why is the GPU more performant in numeric calculations than the CPU? And worse

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply