I am developing codes for the scientific computing community particularly for solving linear system of equations (Ax=b form) iteratively.
I have used BLAS and LAPACK for primitive matrix subroutines but I now realize that there is some scope for manual parallelization. I am working on a Shared Memory system which leaves me with 2 choices: OpenMP and PThreads.
Assuming that time isn’t the greatest factor (& performance of the code is), which is a better, future proof and maybe, portable (to CUDA) way of parallelizing? Is the time spent in using Pthreads worth the performance boost?
I believe that my application (which basically deals with starting many things off at once and then operating upon the “best” value from all of them), will benefit from explicit thread control but I’m afraid the coding will take up too much time and at the end there will be no performance pay off.
I have already looked at few of the similar questions here but they are all pertaining to general applications.
This one is concerning a generic multithreaded application in Linux.
This is a general question as well.
I am aware of SciComp.SE but felt it was more on topic here.
Your question reads as if you expect that the coding efficiency with OpenMP will be higher than with Pthreads, and the execution efficiency higher with Pthreads than with OpenMP. In general I think that you are right. However, a while back I decided that my time was more important than my computer’s time and opted for OpenMP. It’s not a decision I have had cause to regret, nor is it a decision I have any hard evidence to validate.
However you are wrong to think that your choices are limited to OpenMP and Pthreads, MPI (I assume you’ve at least heard of this, post again if not) will also run on shared memory machines. For some applications MPI can be programmed to outperform OpenMP on shared-memory computers without much difficulty.
Three (+/- a few) years ago the essential parallelisation tools in the scientific developer’s toolbox were OpenMP and MPI. Anyone using those tools was part of a large community of fellow users, larger (anecdotal evidence only) than the community of users of Pthreads and MPI. Today, with GPUs and other accelerators popping up all over the place the situation is much more fragmented and it’s difficult to pick one of the winners from HMPP, ACC, Chapel, MPI-3, OpenMP4, CUDA, OpenCL, etc. I still think that OpenMP+MPI is a useful combination, but can’t ignore the new kids on the block.
FWIW I work on the development of computational EM codes for geophysical applications so quite hard core ‘scientific computing’.