This is in F90, but the question holds for any language with OpenMP support.

Question

0

Asked: June 15, 20262026-06-15T09:47:03+00:00 2026-06-15T09:47:03+00:00

This is in F90, but the question holds for any language with OpenMP support.

0

This is in F90, but the question holds for any language with OpenMP support. A typical way of structuring data for a simulation code that needs multiple storage arrays for time integration would be (2 dimensional for now):

REAL, DIMENSION(imax,jmax,n_sub_timesteps) :: vars

Which would then be updated with something like:

DO J = 1, jmax
  DO I = 1, imax
    vars(I,J,2) = func(vars(:,:,1))
  END DO
END DO

In my experience, OpenMP won’t actually parallelize those loops because it thinks vars is not thread-safe. But to the programmer, it obviously is.

And let’s assume for further real-case situations that making vars thread-local would be far too expensive to copy data into it.

So, is there a way to gently hint (aka coerce) OpenMP into not locking vars because it may not figure out that there’s no thread dependency issues but there really aren’t? I know there are ways to tell it that something is not thread-safe and needs locking, but is there a way to specify the inverse without making a copy for each thread?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T09:47:04+00:00

Looks like you mistake OpenMP for automatic parallelisation. I am not aware of any OpenMP implementation that performs data locking unless explicitly told so by the introduction of a CRITICAL section or an ATOMIC statement (or at the end of a parallel region with a REDUCTION clause). OpenMP compilers do not examine your code for possible data dependencies and prevent you from running in parallel – this is entirely left to you. If you want to do unprotected concurrent access, you can do it and no OpenMP-enabled compiler would stop you from doing so. The following code would always produce a parallel region and would distribute the outer loop among the threads in the team:

!$OMP PARALLEL DO PRIVATE(I)
DO J = 1, jmax
  DO I = 1, imax
    vars(I,J,2) = func(vars(:,:,1))
  END DO
END DO
!$OMP END PARALLEL DO

On the other hand, the built-in automatic parallelisers in most compilers are very conservative and cautious and usually would not parallelise a case like yours without explicit hints from the programmer. Those hints are usually in the form of compiler-specific directives (formatted as comments in Fortran or as pragmas in C/C++). For example Intel Fortran supports the !DEC$ PARALLEL directive that hints it to ignore assumed data dependencies in the loop that follows the directive:

!DEC$ PARALLEL
DO J = 1, jmax
  DO I = 1, imax
    vars(I,J,2) = func(vars(:,:,1))
  END DO
END DO

Many compilers reuse their OpenMP implementations and runtime libraries in order to implement the automatic parallelisation feature and hence the working of the resultant executables is usually controlled with OpenMP environment variables like OMP_NUM_THREADS.

If your parallel OpenMP program runs slower than expected, there are many other contributing reasons, mostly related to false sharing, cache trashing, TLB trashing, memory bandwidth limitations, non-local memory access on NUMA systems, using non-temporal loads/stores to shared variables, etc. so it might look like OpenMP is performs automatic data locking, but it doesn’t.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This is in F90, but the question holds for any language with OpenMP support.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply