Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8433271
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T06:20:30+00:00 2026-06-10T06:20:30+00:00

I have the problem that my code returns different results when comparing debug to

  • 0

I have the problem that my code returns different results when comparing debug to release. I checked that both modes use /fp:precise, so that should not be the problem. The main issue I have with this is that the complete image analysis (its an image understanding project) is completely deterministic, there’s absolutely nothing random in it.

Another issue with this is the fact that my release build actually always returns the same result (23.014 for the image), while debug returns some random value between 22 and 23, which just should not be. I’ve already checked whether it may be thread related, but the only part in the algorithm which is multi-threaded returns the precisely same result for both debug and release.

What else may be happening here?

Update1: The code I now found responsible for this behaviour:

float PatternMatcher::GetSADFloatRel(float* sample, float* compared, int sampleX, int compX, int offX)
{
    if (sampleX != compX)
    {
        return 50000.0f;
    }
    float result = 0;

    float* pTemp1 = sample;
    float* pTemp2 = compared + offX;

    float w1 = 0.0f;
    float w2 = 0.0f;
    float w3 = 0.0f;

    for(int j = 0; j < sampleX; j ++)
    {
        w1 += pTemp1[j] * pTemp1[j];
        w2 += pTemp1[j] * pTemp2[j];
        w3 += pTemp2[j] * pTemp2[j];
    }               
    float a = w2 / w3;
    result = w3 * a * a - 2 * w2 * a + w1;
    return result / sampleX;
}

Update2:
This is not reproducible with 32bit code. While debug and release code will always result in the same value for 32bit, it still is different from the 64bit release version, and the 64bit debug still returns some completely random values.

Update3:
Okay, I found it to certainly be caused by OpenMP. When I disable it, it works fine. (both Debug and Release use the same code, and both have OpenMP activated).

Following is the code giving me trouble:

#pragma omp parallel for shared(last, bestHit, cVal, rad, veneOffset)
for(int r = 0; r < 53; ++r)
{
    for(int k = 0; k < 3; ++k)
    {
        for(int c = 0; c < 30; ++c)
        {
            for(int o = -1; o <= 1; ++o)
            {
                /*
                r: 2.0f - 15.0f, in 53 steps, representing the radius of blood vessel
                c: 0-29, in steps of 1, representing the absorption value (collagene)
                iO: 0-2, depending on current radius. Signifies a subpixel offset (-1/3, 0, 1/3)
                o: since we are not sure we hit the middle, move -1 to 1 pixels along the samples
                */

                int offset = r * 3 * 61 * 30 + k * 30 * 61 + c * 61 + o + (61 - (4*w+1))/2;

                if(offset < 0 || offset == fSamples.size())
                {
                    continue;
                }
                last = GetSADFloatRel(adapted, &fSamples.at(offset), 4*w+1, 4*w+1, 0);
                if(bestHit > last)
                {
                    bestHit = last;
                    rad = (r+8)*0.25f;
                    cVal = c * 2;
                    veneOffset =(-0.5f + (1.0f / 3.0f) * k + (1.0f / 3.0f) / 2.0f);
                    if(fabs(veneOffset) < 0.001)
                        veneOffset = 0.0f;
                }
                last = GetSADFloatRel(input, &fSamples.at(offset), w * 4 + 1, w * 4 + 1, 0);
                if(bestHit > last)
                {
                    bestHit = last;
                    rad = (r+8)*0.25f;
                    cVal = c * 2;
                    veneOffset = (-0.5f + (1.0f / 3.0f) * k + (1.0f / 3.0f) / 2.0f);
                    if(fabs(veneOffset) < 0.001)
                        veneOffset = 0.0f;
                }
            }
        }
    }
}

Note: with Release mode and OpenMP activated I get the same result as with deactivating OpenMP. Debug mode and OpenMP activated gets a different result, OpenMP deactivated gets the same result as with Release.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T06:20:31+00:00Added an answer on June 10, 2026 at 6:20 am

    To elaborate on my comment, this is the code that is most probably the root of your problem:

    #pragma omp parallel for shared(last, bestHit, cVal, rad, veneOffset)
    {
        ...
        last = GetSADFloatRel(adapted, &fSamples.at(offset), 4*w+1, 4*w+1, 0);
        if(bestHit > last)
        {
    

    last is only assigned to before it is read again so it is a good candidate for being a lastprivate variable, if you really need the value from the last iteration outside the parallel region. Otherwise just make it private.

    Access to bestHit, cVal, rad, and veneOffset should be synchronised by a critical region:

    #pragma omp critical
    if (bestHit > last)
    {
        bestHit = last;
        rad = (r+8)*0.25f;
        cVal = c * 2;
        veneOffset =(-0.5f + (1.0f / 3.0f) * k + (1.0f / 3.0f) / 2.0f);
        if(fabs(veneOffset) < 0.001)
            veneOffset = 0.0f;
    }
    

    Note that by default all variables, except the counters of parallel for loops and those defined inside the parallel region, are shared, i.e. the shared clause in your case does nothing unless you also apply the default(none) clause.

    Another thing that you should be aware of is that in 32-bit mode Visual Studio uses x87 FPU math while in 64-bit mode it uses SSE math by default. x87 FPU does intermediate calculations using 80-bit floating point precision (even for calculations involving float only) while the SSE unit supports only the standard IEEE single and double precisions. Introducing OpenMP or any other parallelisation technique to a 32-bit x87 FPU code means that at certain points intermediate values should be converted back to the single precision of float and if done sufficiently many times a slight or significant difference (depending on the numerical stability of the algorithm) could be observed between the results from the serial code and the parallel one.

    Based on your code, I would suggest that the following modified code would give you good parallel performance because there is no synchronisation at each iteration:

    #pragma omp parallel private(last)
    {
        int rBest = 0, kBest = 0, cBest = 0;
        float myBestHit = bestHit;
    
        #pragma omp for
        for(int r = 0; r < 53; ++r)
        {
            for(int k = 0; k < 3; ++k)
            {
                for(int c = 0; c < 30; ++c)
                {
                    for(int o = -1; o <= 1; ++o)
                    {
                        /*
                        r: 2.0f - 15.0f, in 53 steps, representing the radius of blood vessel
                        c: 0-29, in steps of 1, representing the absorption value (collagene)
                        iO: 0-2, depending on current radius. Signifies a subpixel offset (-1/3, 0, 1/3)
                        o: since we are not sure we hit the middle, move -1 to 1 pixels along the samples
                        */
    
                        int offset = r * 3 * 61 * 30 + k * 30 * 61 + c * 61 + o + (61 - (4*w+1))/2;
    
                        if(offset < 0 || offset == fSamples.size())
                        {
                            continue;
                        }
                        last = GetSADFloatRel(adapted, &fSamples.at(offset), 4*w+1, 4*w+1, 0);
                        if(myBestHit > last)
                        {
                            myBestHit = last;
                            rBest = r;
                            cBest = c;
                            kBest = k;
                        }
                        last = GetSADFloatRel(input, &fSamples.at(offset), w * 4 + 1, w * 4 + 1, 0);
                        if(myBestHit > last)
                        {
                            myBestHit = last;
                            rBest = r;
                            cBest = c;
                            kBest = k;
                        }
                    }
                }
            }
        }
        #pragma omp critical
        if (bestHit > myBestHit)
        {
            bestHit = myBestHit;
            rad = (rBest+8)*0.25f;
            cVal = cBest * 2;
            veneOffset =(-0.5f + (1.0f / 3.0f) * kBest + (1.0f / 3.0f) / 2.0f);
            if(fabs(veneOffset) < 0.001)
            veneOffset = 0.0f;
        }
    }
    

    It only stores the values of the parameters that give the best hit in each thread and then at the end of the parallel region it computes rad, cVal and veneOffset based on the best values. Now there is only one critical region, and it is at the end of code. You can get around it also, but you would have to introduce additional arrays.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a problem that has been driving me batty. I have code that
i have the following problem with that code in my html tag: <div id=searchbar><input
I recently have a problem that my java code works perfectly ok on my
my first question here! I have a problem with a piece of code that
I have a bit of a problem that I can't seem to code my
I have some code that I'd like to run on a page. My problem
I have a dynamic list code that works just fine. My only problem now
Hi I have a code example available here: http://jsbin.com/oxoweh My problem is that I
So I have the following code, the problem is that it exits before all
I have strange problem with the Google Maps DirectionsService. It returns me different routes

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.