Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 1098901
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T00:39:03+00:00 2026-05-17T00:39:03+00:00

I’m calculating the Euclidean distance between n-dimensional points using OpenCL. I get two lists

  • 0

I’m calculating the Euclidean distance between n-dimensional points using OpenCL. I get two lists of n-dimensional points and I should return an array that contains just the distances from every point in the first table to every point in the second table.

My approach is to do the regular doble loop (for every point in Table1{ for every point in Table2{…} } and then do the calculation for every pair of points in paralell.

The euclidean distance is then split in 3 parts:
1. take the difference between each dimension in the points
2. square that difference (still for every dimension)
3. sum all the values obtained in 2.
4. Take the square root of the value obtained in 3. (this step has been omitted in this example.)

Everything works like a charm until I try to accumulate the sum of all differences (namely, executing step 3. of the procedure described above, line 49 of the code below).

As test data I’m using DescriptorLists with 2 points each:
DescriptorList1: 001,002,003,…,127,128; (p1)
129,130,131,…,255,256; (p2)

DescriptorList2: 000,001,002,…,126,127; (p1)
128,129,130,…,254,255; (p2)

So the resulting vector should have the values: 128, 2064512, 2130048, 128
Right now I’m getting random numbers that vary with every run.

I appreciate any help or leads on what I’m doing wrong. Hopefully everything is clear about the scenario I’m working in.

#define BLOCK_SIZE 128

typedef struct
{
    //How large each point is
    int length;
    //How many points in every list
    int num_elements;
    //Pointer to the elements of the descriptor (stored as a raw array)
    __global float *elements;
} DescriptorList;

__kernel void CompareDescriptors_deb(__global float *C, DescriptorList A, DescriptorList B, int elements, __local float As[BLOCK_SIZE])
{

    int gpidA = get_global_id(0);

    int featA = get_local_id(0);

    //temporary array  to store the difference between each dimension of 2 points
    float dif_acum[BLOCK_SIZE];

    //counter to track the iterations of the inner loop
    int loop = 0;

    //loop over all descriptors in A
    for (int i = 0; i < A.num_elements/BLOCK_SIZE; i++){

        //take the i-th descriptor. Returns a DescriptorList with just the i-th
        //descriptor in DescriptorList A
        DescriptorList tmpA = GetDescriptor(A, i);

        //copy the current descriptor to local memory.
        //returns one element of the only descriptor in DescriptorList tmpA
        //and index featA
        As[featA] = GetElement(tmpA, 0, featA);
        //wait for all the threads to finish copying before continuing
        barrier(CLK_LOCAL_MEM_FENCE);

        //loop over all the descriptors in B
        for (int k = 0; k < B.num_elements/BLOCK_SIZE; k++){
            //take the difference of both current points
            dif_acum[featA] = As[featA]-B.elements[k*BLOCK_SIZE + featA];
            //wait again
            barrier(CLK_LOCAL_MEM_FENCE);
            //square value of the difference in dif_acum and store in C
            //which is where the results should be stored at the end.
            C[loop] = 0;
            C[loop] += dif_acum[featA]*dif_acum[featA];
            loop += 1;
            barrier(CLK_LOCAL_MEM_FENCE);
        }
    }
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T00:39:04+00:00Added an answer on May 17, 2026 at 12:39 am

    Your problem lies in these lines of code:

    C[loop] = 0;
    C[loop] += dif_acum[featA]*dif_acum[featA];
    

    All threads in your workgroup (well, actually all your threads, but lets come to to that later) are trying to modify this array position concurrently without any synchronization whatsoever. Several factors make this really problematic:

    1. The workgroup is not guaranteed to work completely in parallel, meaning that for some threads C[loop] = 0 can be called after other threads have already executed the next line
    2. Those that execute in parallel all read the same value from C[loop], modify it with their increment and try to write back to the same address. I’m not completely sure what the result of that writeback is (I think one of the threads succeeds in writing back, while the others fail, but I’m not completely sure), but its wrong either way.

    Now lets fix this:
    While we might be able to get this to work on global memory using atomics, it won’t be fast, so lets accumulate in local memory:

    local float* accum;
    ...
    accum[featA] = dif_acum[featA]*dif_acum[featA];
    barrier(CLK_LOCAL_MEM_FENCE);
    for(unsigned int i = 1; i < BLOCKSIZE; i *= 2)
    {
        if ((featA % (2*i)) == 0)
            accum[featA] += accum[featA + i];
        barrier(CLK_LOCAL_MEM_FENCE);
    }
    if(featA == 0)
        C[loop] = accum[0];
    

    Of course you can reuse other local buffers for this, but I think the point is clear (btw: Are you sure that dif_acum will be created in local memory, because I think I read somewhere that this wouldn’t be put in local memory, which would make preloading A into local memory kind of pointless).

    Some other points about this code:

    1. Your code is seems to be geared to using only on workgroup (you aren’t using either groupid nor global id to see which items to work on), for optimal performance you might want to use more then that.
    2. Might be personal preferance, but I to me it seems better to use get_local_size(0) for the workgroupsize than to use a Define (since you might change it in the host code without realizing you should have changed your opencl code to)
    3. The barriers in your code are all unnecessary, since no thread accesses an element in local memory which is written by another thread. Therefore you don’t need to use local memory for this.

    Considering the last bullet you could simply do:

    float As = GetElement(tmpA, 0, featA);
    ...
    float dif_acum = As-B.elements[k*BLOCK_SIZE + featA];
    

    This would make the code (not considering the first two bullets):

    __kernel void CompareDescriptors_deb(__global float *C, DescriptorList A, DescriptorList B, int elements, __local float accum[BLOCK_SIZE])
    {
       int gpidA = get_global_id(0);
       int featA = get_local_id(0);
       int loop = 0;
       for (int i = 0; i < A.num_elements/BLOCK_SIZE; i++){
           DescriptorList tmpA = GetDescriptor(A, i);
           float As = GetElement(tmpA, 0, featA);
           for (int k = 0; k < B.num_elements/BLOCK_SIZE; k++){
               float dif_acum = As-B.elements[k*BLOCK_SIZE + featA];
    
               accum[featA] = dif_acum[featA]*dif_acum[featA];
               barrier(CLK_LOCAL_MEM_FENCE);
               for(unsigned int i = 1; i < BLOCKSIZE; i *= 2)
               {
                  if ((featA % (2*i)) == 0)
                     accum[featA] += accum[featA + i];
                  barrier(CLK_LOCAL_MEM_FENCE);
               }
               if(featA == 0)
                  C[loop] = accum[0];
               barrier(CLK_LOCAL_MEM_FENCE);
    
               loop += 1;
            }
        }
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

link Im having trouble converting the html entites into html characters, (&# 8217;) i
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I am reading a book about Javascript and jQuery and using one of the
I'm using v2.0 of ClassTextile.php, with the following call: $testimonial_text = $textile->TextileRestricted($_POST['testimonial']); ... and
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
We're building an app, our first using Rails 3, and we're having to build
We are using XSLT to translate a RIXML file to XML. Our RIXML contains
I have thousands of HTML files to process using Groovy/Java and I need to
I'm having trouble keeping the paragraph square between the quote marks. In firefox the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.