Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4339944
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 21, 20262026-05-21T11:17:00+00:00 2026-05-21T11:17:00+00:00

I’m working on my game project (tower defense) and I’m trying to compute the

  • 0

I’m working on my game project (tower defense) and I’m trying to compute the distance between all critters and a tower with JCuda using shared memory. For each tower I run 1 block with N threads, where N equals the number of critters on the map. I’m computing the distance between all critters and that tower for a given block, and I store the smallest found distance so far in the block’s shared memory. My current code looks like that:

extern "C"

__global__ void calcDistance(int** globalInputData, int size, int
critters, int** globalQueryData, int* globalOutputData) {

  //shared memory
  __shared__ float minimum[2];

  int x = threadIdx.x  + blockIdx.x * blockDim.x;
  int y = blockIdx.y;

  if (x < critters) {

    int distance = 0;
    //Calculate the distance between tower and criter
    for (int i = 0; i < size; i++) {
      int d = globalInputData[x][i] - globalQueryData[y][i];
      distance += d * d;
    }

    if (x == 0) {        
      minimum[0] = distance;
      minimum[1] = x;
    }

    __syncthreads();

    if (distance < minimum[0]) {
      minimum[0] = distance;
      minimum[1] = x;
    }
   
    __syncthreads();
    globalOutputData[y * 2]     = minimum[0];
    globalOutputData[y] = minimum[1];

  }
}

The problem is if I rerun the code using the same input multiple times (I free all the memory on both host and device after each run) I get different output each time I the code gets executed for blocks (tower) number > 27… I’m fairly sure it has something to do with the shared memory and the way I’m dealing with it, as rewriting the code to use global memory gives the same result whenever the code gets executed. Any ideas?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-21T11:17:00+00:00Added an answer on May 21, 2026 at 11:17 am

    There is a memory race problem (so read-after-write correctness) in that kernel here:

       if (distance < minimum[0]) {
         minimum[0] = distance;
         minimum[1] = x;
       }
    

    When executed, every thread in the block is going to try and simultaneously read and write the value of minimum. There are no guarantees what will happen when multiple threads in a warp try writing to the same shared memory location, and there are no guarantees what values that other warps in the same block will read when loading from a memory location to which is being written. Memory access is not atomic, and there is no locking or serialization which would ensure that code performed the type of reduction operation you seem to be trying to do.

    A milder version of the same problem applies to the write back to global memory at the end of the kernel:

       __syncthreads();
       globalOutputData[y * 2]     = minimum[0];
       globalOutputData[y] = minimum[1];
    

    The barrier before the writes ensures that the writes to minimum will be completed prior that a “final” (although inconsistent) value will be stored in minimum, but then every thread in the block will execute the write.

    If your intention is to have each thread compute a distance, and then for the minimum of the distance values over the block to get written out to global memory, you will have to either use atomic memory operations (for shared memory this is supported on compute 1.2/1.3 and 2.x devices only), or write an explicit shared memory reduction. After that, only one thread should execute the write back to global memory.

    Finally, you also have a potential synchronization correctness problem that could cause the kernel to hang. __syncthreads() (which maps to the PTX bar instruction) demands that every thread in the block arrive and execute the instruction prior to the kernel continuing. Having this sort of control flow:

     if (x < critters) {
     ....
       __syncthreads();
     ....
     }
    

    will cause the kernel to hang if some threads in the block can branch around the barrier and exit while others wait at the barrier. There should never be any branch divergence around a __syncthreads() call to ensure execution correctness of a kernel in CUDA.

    So, in summary, back to the drawing board on at least three issues in the current code.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am trying to find ID3V2 tags from MP3 file using jid3lib in Java.
I'm trying to convert HTML to plain text. I get many &\#8217; &\#8220; etc.
Let's say I'm outputting a post title and in our database, it's Hello Y&#8217;all
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
link Im having trouble converting the html entites into html characters, (&# 8217;) i
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I am trying to understand how to use SyndicationItem to display feed which is
Basically, what I'm trying to create is a page of div tags, each has
I am using JSon response to parse title,date content and thumbnail images and place
I'm using v2.0 of ClassTextile.php, with the following call: $testimonial_text = $textile->TextileRestricted($_POST['testimonial']); ... and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.