I was experiencing with parallel scalar producting two vectors and measuring the time elapsed.
I was comparing sequential vs parallel scalar product:
seq: double scalar(int n, double x[], double y[])
for (int i=0; i<n; i++)
{
sum += x[i]*y[i];
}
parallel: double scalar_shm(int n, double x[], double y[])
#pragma omp parallel for private(i) shared(x,y) reduction(+:sum)
for (i=0; i<n; i++)
{
sum += x[i]*y[i];
}
I called these one after the other:
//sequential loop
for (int n=0; n<loops; n++)
{ scalar(vlength,x,y); }
//measure sequential time
t1 = omp_get_wtime() - tstart;
//parallel loop
for (int n=0; n<loops; n++)
{ scalar_shm(vlength,x,y); }
//measure parallel time
t2 = omp_get_wtime() - t1 - tstart;
//print the times elapsed
cout<< "total time (sequential): " <<t1 <<" sec" <<endl;
cout<< "total time (parallel ): " <<t2 <<" sec" <<endl;
Every cycle I filled up the vectors with random doubles, I removed that part, because I consider it irrelevant.
The output for this was:
total time (sequential): 15.3439 sec
total time (parallel ): 24.5755 sec
My question is why is the parallel one slower? What is it good for if it’s slower? I expected it to be way faster, because I kind of thought that computations like this were the point of it.
note: I ran this on an Intel Core i7-740QM
You are creating and destroying a new parallel section code for each iteration. This operation is very slow. You could try to create the parallel section outside the internal loop:
Inside scalar_shm function, the OpenMP pragma would be: