I have a loop that has been parallelized by OpenMP, but due to the nature of the task, there are 4 critical clauses.
What would be the best way to profile the speed up and find out which of the critical clauses (or maybe non-critical(!) ) take up the most time inside the loop?
I use Ubuntu 10.04 with g++ 4.4.3
OpenMP includes the functions omp_get_wtime() and omp_get_wtick() for measuring timing performance (docs here), I would recommend using these.
Otherwise try a profiler. I prefer the google CPU profiler which can be found here.
There is also the manual way described in this answer.