Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6047201
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T07:19:31+00:00 2026-05-23T07:19:31+00:00

For a tutorial I’m writing, I’m looking for a realistic and simple example of

  • 0

For a tutorial I’m writing, I’m looking for a “realistic” and simple example of a deadlock caused by ignorance of SIMT / SIMD.

I came up with this snippet, which seems to be a good example.

Any input would be appreciated.

…
int x = threadID / 2;
if (threadID > x) {
    value[threadID] = 42;
    barrier();
    }
else {
    value2[threadID/2] = 13
    barrier();
}
result = value[threadID/2] + value2[threadID/2];

I know, it is neither proper CUDA C nor OpenCL C.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T07:19:31+00:00Added an answer on May 23, 2026 at 7:19 am

    A simple deadlock that is actually easy to catch by the novice CUDA programmer is when one tries to implement a critical section for a single thread, that should ultimately be performed by all threads. It goes more-or-less like this:

    __global__ kernel() {
      __shared__ int semaphore;
      semaphore=0;
      __syncthreads();
      while (true) {
        int prev=atomicCAS(&semaphore,0,1);
        if (prev==0) {
          //critical section
          semaphore=0;
          break;
        }
      }
    }
    

    The atomicCAS instruction ensures that exaclty one thread gets 0 assigned to prev, while all others get 1. When that one thread finishes its critical section, it sets the semaphore back to 0 so that other threads have a chance to enter the critical section.

    The problem is, that while 1 thread gets prev=0, 31 threads, belonging to the same SIMD unit get a value 1. At the if-statement CUDA scheduler puts that single thread on-hold (masks it out) and let other 31-threads continue their work. In normal circumstances it is a good strategy, but in this particular case you end up with 1 critical-section thread that is never executed and 31 threads waiting for infinity. Deadlock.

    Also note, the existence of break which leads the control flow outside of the while loop. If you ommit the break instruction and have some more operations after the if-block that are supposed to be executed by all threads, it may actually help the scheduler avoid the deadlock.

    Regarding your example given in the question: in CUDA it is explicitly forbidden to put __syncthreads() in a SIMD-diverging code. The compiler won’t catch it but the manual says about “undefined behaviour”. In practice, on pre-Fermi devices, all __syncthreads() are seen as the same barriers. With that assumtion, your code would actually terminate without an error. One should not rely on this behaviour though.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

After looking at this tutorial on blobs: channel 9 , I was thinking of
I followed the tutorial from this site: http://theappleblog.com/2008/08/04/tutorial-build-a-simple-rss-reader-for-iphone/ to make my first iPhone application,
I followed this simple tutorial and created a nested repeater. This tutorial is simple
ajax tutorial on w3school at http://www.w3schools.com/ajax/ajax_database.asp In this function (function GetXmlHttpObject()), it creates a
This tutorial on mobileorchard.com uses 2 classes (or 2 sets of .h and .m)
I followed this tutorial on configuring the Rails plugin ExceptionNotifier. I know that I
This tutorial for programming these starts with programming the Ravens and Jackdaw with a
The tutorial on the django website shows this code for the models: from django.db
I've followed this tutorial for setting up a static library with common classes from
I followed this tutorial for setting Autlogic up properly . So, my site needs

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.