I’m havening look at some code that uses OpenMP, though I’m not too familiar

Question

0

Asked: June 16, 20262026-06-16T15:54:32+00:00 2026-06-16T15:54:32+00:00

I’m havening look at some code that uses OpenMP, though I’m not too familiar

0

I’m havening look at some code that uses OpenMP, though I’m not too familiar with it. (The code nor OpenMP.)

When running a profiler against it, I see that the program is supposedly spending about 20% of wall-clock time in an “OMP implicit barrier” function.

Is that typical of OpenMP, or does that (maybe) imply that work load is not distributed evenly among threads?

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T15:54:33+00:00

There are implicit barriers at the end of most OpenMP constructs like for (in C/C++) or do (in Fortran), sections and single (however, there’s no barrier at the end of the master construct). The nowait clause can be used to disable these implicit barriers if the algorithm allows for the different threads to run desynchronised after the worksharing directive. Another implicit barrier is located at the end of each parallel region as part of the fork/join execution model.

You have correctly guessed that high percentage of implicit barrier wait time usually means that the worksharing is far from optimal. It could be that there are (lots of) large single constructs or it could be that there are parallel loops (for/do constructs) with varying execution time for each iteration. If the imbalance comes from loops with varying computational time in each iteration (canonical example is drawing the Mandelbrot set), then the loop schedule can be changed to dynamic using the schedule(dynamic,chunk) clause, where chunk is the chunk size (>= 1). The smaller the chunk size, the better is the load balanced but there would be higher overhead from the dynamical loop dispatcher. The bigger the chunk size the lower the overhead but more load imbalance would appear. The optimal value often depends on the kind of problem and on the hardware so one has to tweak the value in order to obtain the best performance on the particular system where the code executes.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m havening look at some code that uses OpenMP, though I’m not too familiar

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply