Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8561123
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T16:25:53+00:00 2026-06-11T16:25:53+00:00

Why is the GPU more performant in numeric calculations than the CPU? And worse

  • 0

Why is the GPU more performant in numeric calculations than the CPU? And worse at branching? Can someone give me a detailed explanation of it?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T16:25:55+00:00Added an answer on June 11, 2026 at 4:25 pm

    Each SM in GPU is an SIMD processor executing different threads of the warp on each lane of SIMD. Once application is more computation-bound (a few memory accesses) and no branch application achieves the peak FLOPS of GPU. This is due to the fact that upon branch, GPUs mask the one side of divergence and executes the other one first. Both paths are executed serially leaving some SIMD lanes inactive which accordingly drops performance.

    I’ve included a useful Figure from Fung’s paper which is publicly available at the mentioned reference to show how performance actually drops:enter image description here

    Figure (a) shows a typical branch divergence in GPUs occurred inside a warp (4 threads in this sample). Suppose you have following kernel code:

    A:  // some computation
        if(X){
    B:      // some computation
            if(Y){
    C:          // some computation
            }
            else{
    D:          // some computation
            }
    E:      // some computation
        }else{
    F:      // some computation
        }
    G:  // some computation
    

    Threads at A diverge into B and F. As shown in (b) some of the SIMD lanes are disabled over the time dropping performance. Figure (c) to (e) show how hardware serially executes diverging paths and manages divergence. For more information refer to this useful paper which is great starting point.

    Compute-bounded applications like matrix multiply or N-Body simulation well mapped to GPUs and return very high performance. This is due to the fact they well occupy SIMD lanes, follow streaming model, and have a few memory accesses.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Can anybody give me some examples and brief explanation of parallel computing in image
I've been searching for the major differences between a CPU and a GPU, more
I know many examples when GPU is much faster than CPU. But exists algorithms
Which parts of pipelines are done using CPU and which are done using GPU?
In general, can Mathematica automatically (i.e. without writing code specifically for this) exploit GPU
The performance of the iPad 2 GPU is way better than the iPad 1.
Hi I can't determine if CSS3D Transforms work on Firefox 6. More importantly, are
I'm looking for advice more than direct help. I am working on an 8
Here is the gpu surf code: #include <iostream> #include <iomanip> #include opencv2/contrib/contrib.hpp #include opencv2/objdetect/objdetect.hpp
The current GPU execution and memory models are somehow limited (memory limit, limit of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.