I have seen an algorithm for parallel merge-sort in a this paper. This is the code:
void mergesort_parallel_omp (int a[], int size, int temp[], int threads)
{
if ( threads == 1) { mergesort_serial(a, size, temp); }
else if (threads > 1)
{
#pragma omp parallel sections
{
#pragma omp section
mergesort_parallel_omp(a, size/2, temp, threads/2);
#pragma omp section
mergesort_parallel_omp(a + size/2, size - size/2, temp + size/2, threads - threads/2);
}
merge(a, size, temp);
} // threads > 1
}
I run it on a multicore. What happens is that at the leafs of the tree, 2 threads run in parallel. After they finished their work 2 other threads start and so on. Even if we have free cores for all the leaf nodes.
I think the reason is this OpenMP code does not create parallel regions inside parallel regions. Am I correct?
You can have a parallel region of parallel region.
In order to run your code correctly, you need to call
omp_set_nested(1)andomp_set_num_threads(2).For a better performance instead of sections you can use OpenMP tasks (detailed information and examples about can be found here) as follows:
The sequential code of the merge algorithm comes from Dr. Johnnie W. Baker webpage.. However, the code that I am providing in this answer has some corrections and performance improvements.
A full running example:
An had-doc benchmark in a 4 core machine yield the following results:
Future improvements will be available on GitHub.
An advance C++ version of parallel version can be found here. The final algorithm looks like the following:
A reported speedup of
6.61xfor 48 threads.