I have a program which distributes particles into a cloud-in-cell mesh. Simply loops over the total number of particles (Ntot) and populates a 256^3 mesh (i.e. each particle gets distributed over 8 cells).
% gfortran -fopenmp cic.f90 -o ./cic
Which compiles fine. But when I run it (./cic) I get a segmentation fault. I my looping is a classic omp do problem. The program works when I don’t compile it in openmp.
!$omp parallel do
do i = 1,Ntot
if (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0) then
dense(int(x1(i)),int(y1(i)),int(z1(i))) = dense(int(x1(i)),int(y1(i)),int(z1(i))) &
+ dx1(i) * dy1(i) * dz1(i) * mpart
end if
if (x2(i).le.Ng.and.y1(i).gt.0.and.z1(i).gt.0) then
dense(int(x2(i)),int(y1(i)),int(z1(i))) = dense(int(x2(i)),int(y1(i)),int(z1(i))) &
+ dx2(i) * dy1(i) * dz1(i) * mpart
end if
if (x1(i).gt.0.and.y2(i).le.Ng.and.z1(i).gt.0) then
dense(int(x1(i)),int(y2(i)),int(z1(i))) = dense(int(x1(i)),int(y2(i)),int(z1(i))) &
+ dx1(i) * dy2(i) * dz1(i) * mpart
end if
if (x2(i).le.Ng.and.y2(i).le.Ng.and.z1(i).gt.0) then
dense(int(x2(i)),int(y2(i)),int(z1(i))) = dense(int(x2(i)),int(y2(i)),int(z1(i))) &
+ dx2(i) * dy2(i) * dz1(i) * mpart
end if
if (x1(i).gt.0.and.y1(i).gt.0.and.z2(i).le.Ng) then
dense(int(x1(i)),int(y1(i)),int(z2(i))) = dense(int(x1(i)),int(y1(i)),int(z2(i))) &
+ dx1(i) * dy1(i) * dz2(i) * mpart
end if
if (x2(i).le.Ng.and.y1(i).gt.0.and.z2(i).le.Ng) then
dense(int(x2(i)),int(y1(i)),int(z2(i))) = dense(int(x2(i)),int(y1(i)),int(z2(i))) &
+ dx2(i) * dy1(i) * dz2(i) * mpart
end if
if (x1(i).gt.0.and.y2(i).le.Ng.and.z2(i).le.Ng) then
dense(int(x1(i)),int(y2(i)),int(z2(i))) = dense(int(x1(i)),int(y2(i)),int(z2(i))) &
+ dx1(i) * dy2(i) * dz2(i) * mpart
end if
if (x2(i).le.Ng.and.y2(i).le.Ng.and.z2(i).le.Ng) then
dense(int(x2(i)),int(y2(i)),int(z2(i))) = dense(int(x2(i)),int(y2(i)),int(z2(i))) &
+ dx2(i) * dy2(i) * dz2(i) * mpart
end if
end do
!$omp end parallel do
There are no dependencies between iterations. Ideas?
This problem, as well as the one in your other question, comes from the fact that automatic heap arrays are disabled when OpenMP is enabled. This means that without
-fopenmp, big arrays are automatically placed in the static storage (known as the.bsssegment) while small arrays are allocated on the stack. When you switch OpenMP support on, no automatic static allocation is used and yourdensearrays gets allocated on the stack of the routine. The default stack limits on OS X are very restrictive, hence the segmentation fault.You have several options here. The first option is to make
densehave static allocation by giving it theSAVEattribute. The other option is to explicitly allocate it on the heap by making itALLOCATABLEand then using theALLOCATEstatement, e.g.:Newer Fortran versions support automatic deallocation of arrays without the
SAVEattribute when they go out of scope.Note that your OpenMP directive is just fine and no additional data sharing clauses are necessary. You do not need to declare
iin aPRIVATEclause since loop counters have predetermined private data-sharing class. You do not need to put the other variables inSHAREDclause as they are implicitly shared. Yet the operations that you do ondenseshould either be synchronised withATOMIC UPDATE(or simplyATOMICon older OpenMP implementations) or you should useREDUCTION(+:dense). Atomic updates are translated to locked additions and should not incur much of a slowdown, compared to the huge slowdown from having conditionals inside the loop:Replicate the code with the proper changes for the other cases. If your compiler complains about the
UPDATEclause in theATOMICconstruct, simply delete it.REDUCTION(+:dense)would create one copy ofdensein each thread, which would consume a lot of memory and the reduction applied in the end would grow slower and slower with the size ofdense. For small arrays it would work better than atomic updates.