I’m trying to write OpenMP solution for Insertion sort but I’m having problems to make it run in parallel and give correct results :). Is there any way to make Insertion sort it run in parallel.
Here is my code:
void insertionsort(int *A, int num)
{
// clock_t start, stop;
//
// start=clock();
int k;
#pragma omp parallel for shared(A) private(k)
for(int n = 1; n < num; n++)
{
int key = A[n];
k = n;
#pragma omp critical
for(;k>0 && A[k-1]> key;k--)
{
A[k] = A[k-1];
}
A[k] = key;
}
// stop=clock();
// cas = (double)(stop-start)/CLOCKS_PER_SEC;
}
You should not parallelize the
Insertion Sort algorithmin this manner. From the inner loop conditionA[k-1]> key;one can see that this algorithm assumes that for a givenkeyin the positionkof the array, if the actual key is bigger than the keys stored on the previous position of the array the swapping of elements stops. Hence, this algorithm assumes that the keys on the positions beforekare already sorted.When you introduce parallelism, for instance, with two threads, threads
0and1start from the beginning and middle of the array, respectively. However, the first half of the array is not sorted, according to the (wrong) assumption made by the algorithm, which consequently, leads to issues.Let me illustrate the problem, let us sort the
array = [-1,2,-3,4,-5,6,-7,8]with 2 threads:Let us fix a given execution ordered (in reality it is non-deterministic)
[-1,2,-3,4,-5,6,-7,8][-1,2,-3,4,-5,6,-7,8][-3,-1,2,4,-5,6,-7,8][-7,-3,-1,2,4,-5,6,8][-7,-3,-1,2,4,-5,6,8][-7,-3,-1,2,4,-5,6,8][-7,-3,-1,2,4,-5,6,8]Final result :
[-7,-3,-1,2,4,-5,6,8]On line 4 thread
1takes the key-7from position6and puts it in the end of the array sifting all its elements from positions1 to 6(included) one position to the right, so now-5is on the old position of-7. Since, the old position of-7(6) will never be compared again-5will stay there untouchable. Therefore, making the array unsorted.An easy, albeit poor solution, would be to add the OpenMP
orderedclause to theparallel forconstruct. However, using this constructor would basically sequentialize your code.Another possible solution, although I am not 100% sure it fits your needs, would be to make your algorithm parallel by Regular Sampling. You can see here an example of this latter technique apply on
quicksort.The struct of your algorithm is not the most suitable to be parallelize directly and achieve a good speedups. Since each iteration of the inner loop is interdependent, this will required the use of methods to unsure mutual exclusion, thus resulting in synchronization overhead. You have far better sorting algorithm that you can directly parallelize, typically the ones that use a divide-and-conquer strategy, for instance Radix Sort or Quick Sort, among others.