Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 5950119
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T17:20:54+00:00 2026-05-22T17:20:54+00:00

I have some examples which give me some strange headaches: I produce a thread

  • 0

I have some examples which give me some strange headaches:
I produce a thread divergence, but I cannot figure out which branch or which statements are computed first?

First example:
I have the following kernel, which I start by 2 threads in 1 block.
with a[0]=0, and a1=0.

__global__ void branchTest_kernel( float* a){

  int tx = threadIdx.x;

  if(tx==0){                   // or tx==1
     a[1] = a[0] + 1;  (a)
  }else if(tx==1){             // or tx==0
     a[0] = a[1] + 1;;         (b)
  }
}

Output

a[0] = 1  
a[1] = 1 

I assum that because the two threads are in one warp, they execute in lockstep, and (a) and (b) both read at the same time a[0] and a1.

Second example:
Exactly the same as the first but, now removed the else if part:

__global__ void branchTest_kernel( float* a){

  int tx = threadIdx.x;

  if(tx==0){
     a[1] = a[0] + 1;  (a)
  }else{
     a[0] = a[1] + 1;  (b)
  }


} 

Output

a[0] = 1  
a[1] = 2 

What causes this behaviour that suddenly now (b) is first, and (a) second… (most inner branch probably)
Can somebody explain how the precendence rules are for branches? Or where to find such information?

I encountered this example during an implementation of a Gauss-Seidel Solver:
Gauss Seidel See Figure 3, (a) diagonal block

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T17:20:55+00:00Added an answer on May 22, 2026 at 5:20 pm

    There are no precedence rules for branch execution order within a warp in CUDA – the behaviour is undefined. The compiler, assembler and JIT runtime are free to reorder instructions as they see fit, and you absolutely must not try and rely on whatever order you deduce empirically, because it can change (as you have found out). The only way to enforce formal correctness in that sort of situation is to use a atomic memory access operation, which will force serialization. Better still, look for another algorithm.

    In your Gauss-Seidel case, the orthodox approach is use a separate kernel launch for each color in the graph decomposition of the matrix or computational grid.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

We have some examples of pictures. And we have on input set of pictures.
I have multiple programmers contributing examples for javadocs and some examples contain comments formatted
I have some data similar to the following chart: http://developer.yahoo.com/yui/examples/charts/charts-seriescustomization_clean.html Only difference is that
I have done some research, and majority of the examples I have found use
I have code examples from some of my previous work that help me to
I have two examples I have a question about. Let me explain via some
Anyone knows of some good Unicode tutorials with examples in C? I have to
I'm new to pgf so i was trying out some examples from the pgfplot
I have a situation.. Having a string which can contain numbers,letters, and some symbols,
recently i want to convert some codes from opencv to c#. But I have

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.