I try to write simple application using OpenMP. Unfortunately I have problem with speedup.

Question

0

Asked: June 3, 20262026-06-03T23:03:17+00:00 2026-06-03T23:03:17+00:00

I try to write simple application using OpenMP. Unfortunately I have problem with speedup.

0

I try to write simple application using OpenMP. Unfortunately I have problem with speedup.
In this application I have one while loop. Body of this loop consists of some instructions which should be done sequentially and one for loop. I use #pragma omp parallel for to make this for loop parallel. This loop doesn’t have much work, but is called very often.

I prepare two versions of for loop, and run application on 1, 2 and 4cores.
version 1 (4 iterations in for loop): 22sec, 23sec, 26sec.
version 2 (100000 iterations in for loop): 20sec, 10sec, 6sec.

As you can see, when for loop doesn’t have much work, time on 2 and 4 cores is higher than on 1core.
I guess the reason is that #pragma omp parallel for creates new threads in each iteration of while loop. So, I would like to ask you – is there any possibility to create threads once (before while loop), and ensure that some job in while loop will be done sequentially?

#include <omp.h>
#include <iostream>
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int main(int argc, char* argv[])
{
    double sum = 0;
    while (true)
    {
        // ...
        // some work which should be done sequentially
        // ...

        #pragma omp parallel for num_threads(atoi(argv[1])) reduction(+:sum)
        for(int j=0; j<4; ++j)  // version 2: for(int j=0; j<100000; ++j)
        {
            double x = pow(j, 3.0);
            x = sqrt(x);
            x = sin(x);
            x = cos(x);
            x = tan(x);
            sum += x;

            double y = pow(j, 3.0);
            y = sqrt(y);
            y = sin(y);
            y = cos(y);
            y = tan(y);
            sum += y;

            double z = pow(j, 3.0);
            z = sqrt(z);
            z = sin(z);
            z = cos(z);
            z = tan(z);
            sum += z;
        }

        if (sum > 100000000)
        {
            break;
        }
    }
    return 0;
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T23:03:19+00:00

You could move the parallel region outside of the while (true) loop and use the single directive to make the serial part of the code to execute in one thread only. This will remove the overhead of the fork/join model. Also OpenMP is not really useful on thight loops with very small number of iterations (like your version 1). You are basically measuring the OpenMP overhead since the work inside the loop is done really fast – even 100000 iterations with transcendental functions take less than second on current generation CPU (at 2 GHz and roughly 100 cycles per FP instruciton other than addition, it’ll take ~100 ms).

That’s why OpenMP provides the if(condition) clause that can be used to selectively turn off the parallelisation for small loops:

#omp parallel for ... if(loopcnt > 10000)
for (i = 0; i < loopcnt; i++)
   ...

It is also advisable to use schedule(static) for regular loops (that is for loops in which every iteration takes about the same time to compute).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I try to write simple application using OpenMP. Unfortunately I have problem with speedup.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply