A compiler that implements the OpenMP standard may, but is not obliged to, exploit special hardware instructions to make certain memory updates following a #pragma omp atomic directive atomic, avoiding expensive locks. According to http://gcc.gnu.org/onlinedocs/gccint/OpenMP.html, GCC implements an atomic update as follows:
Whenever possible, an atomic update built-in is used. If that fails, a compare-and-swap loop is attempted. If that also fails, a regular critical section around the expression is used.
-
How can I determine which of the three is actually used on a given machine and GCC version? Is there some verbosity option for GCC that I can set to find out without having to profile my program or look a the generated bytecode?
-
Is there some documentation listing CPUs/architectures that provide atomic addition/increment/etc instructions, allowing me to predict the outcome for a given machine?
I’m using GCC versions 4.2 to 4.6 on a variety of different machines.
You may look at the intermediate tree representations with the
-fdump-tree-alloption. Given that option, GCC writes a set of files at several intermediate steps and one can observe the successive transformations applied to the tree. The.ompexpfile is of particular interest here, since it contains the tree just after the OpenMP expressions were expanded into their concrete implementations.For example, the block inside the
parallelregion in the following simple code:is transformed by GCC 4.7.2 on 64-bit Linux into:
which finally ends into:
As for the second question, it might also depend on how GCC was built.