As the title states, I want to parallelize a sum using OpenMP. I searched for different approaches but I either do not understand what they do or they didn’t work. Here’s what I found:
1)
!$OMP PARALLEL WORKSHARE
P_pump_t = 0.5d0 * dcv / pi**2 * sum( k * p_pump_k * dk )
!$OMP END PARALLEL WORKSHARE
Works, but I dont understand what happens and what benefit I get.
2)
!$OMP PARALLEL DO REDUCTION(+:P_pump_t)
do l = 1, n
P_pump_t = P_pump_t + 0.5d0 * dcv / pi**2 * k(l) * p_pump_k(l) * dk(l)
end do
!$OMP END PARALLEL DO
Gives wrong (different from 1) or 3)) results.
3) Of course I could compute a new array (parallelized) and let this one in the end summed up…
A hint on how to do it best?
Based on the amount of code that you share, I would guess that “but I dont 2)” means that the loop version gives incorrect (different?) results. This could be if you omitted the initialisation of
P_pump_tto0.0before the summation loop. Also note that both codes might produce slightly different results because of the non-associativity of floating-point operations – for example,(a+b)+cmight produce a slightly different result froma+(b+c)because of the rounding and normalisation applied after each operation. Something like this would better match the vectorised version of your code:It is quite possible that
ifortalready does extract the common multiplication after the loop – it is pretty good at performing such optimisations.Also note that with Intel’s OpenMP implementation the
WORKSHAREdirective is simply translated toSINGLE, i.e. the code actually runs in serial and on 32-bit machines that use x87 FPU instructions one can expect different results from the serial version than from the multithreaded one because of the higher internal precision of the x87 FPU.