Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9148239
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T11:13:10+00:00 2026-06-17T11:13:10+00:00

I am trying to write a mutex for OpenCL. The idea is for every

  • 0

I am trying to write a mutex for OpenCL. The idea is for every single individual work item to be able to proceed atomically. Currently, I believe the problem may be that thread warps are unable to proceed when one thread in a warp gets the lock.

My current simple kernel below, for summing numbers. “numbers” is an array of floats as input. “sum” is a one element array for the result, and “semaphore” is a one element array for holding the semaphore. I based it heavily off the example here.

void acquire(__global int* semaphore) {
    int occupied;
    do {
        occupied = atom_xchg(semaphore, 1);
    } while (occupied>0);
}
void release(__global int* semaphore) {
    atom_xchg(semaphore, 0); //the previous value, which is returned, is ignored
}
__kernel void test_kernel(__global float* numbers, __global float* sum, __global int* semaphore) {
    int i = get_global_id(0);
    acquire(semaphore);
    *sum += numbers[i];
    release(semaphore);
}

I am calling the kernel effectively like:

int numof_dimensions = 1;
size_t offset_global[1] = {0};
size_t size_global[1] = {4000}; //the length of the numbers array
size_t* size_local = NULL;
clEnqueueNDRangeKernel(command_queue, kernel, numof_dimensions,offset_global,size_global,size_local, 0,NULL, NULL);

As above, when running, the graphics card hangs, and the driver restarts itself. How can I fix it so that it doesn’t?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T11:13:13+00:00Added an answer on June 17, 2026 at 11:13 am

    The answer to this might seem obvious in retrospect, but it’s not unless you thought of it.

    Basically, the GPU’s prediction of the ideal local group size (size of a thread warp) is greater than 1, and so thread warps lock up. To fix it, you just need to specify it to be 1 (i.e. “size_t size_local[1] = {1};”). Doing this produces a correct result.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am currently trying to write a concurrent queue, but I have some segfaults
I've been trying write an application which will be able to connect to a
I am trying to write a simple lock/unlock algorithm that behaves like a mutex
I'm currently trying write code that will maintain the sorting preference while changing page
I am trying to build a library that would write to a single file,
I was trying to implement read/write lock using mutex only (just for learning). Just
I am trying to use boost::shared_mutex to implement a multiple-reader / single-writer mutex. My
I am trying write a script where when a user clicks on a link,
I m trying write code that after reset set up rrpmax as 3000. It
I am trying write a function that generates simulated data but if the simulated

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.