In an attempt to speed up my parallel code which involves many two level nested loops
I created an integer array and stored indices of the loops in their order and thus the two level nested loop became one level large loop which was supposed to reduce the overhead.
k = 0;
for (int i=0;i<n;++i)
{
for (int j=0;j<n;++j)
{
index[k][0] = i;
index[k][1] = j;
}
}
for example :
#pragma omp for
for (int i=0;i<n;++i)
{
for (int j=0;j<n;++j)
{
a[i][j] = 2.0*i+3.0;
}
}
turned to
#pragma omp for
for (int k=0;k<n;++k)
{
i = index[k][0];
j = index[k][1];
a[i][j] = 2.0*i+3.0;
}
to my surprise the code slowed down instead of speeding up and I don’t know why ?.
Loops aren’t expensive. Its what you do inside the loop that is expensive. You’ve created a new loop which is run i*j times so you end up executing the inner code the same number of times. So you haven’t saved anything but a minuscule amount for the overhead of the inner loop.
Your new code now accesses memory for every iteration. Memory is slow. Much slower then the overhead of the for loops you’ve gotten rid of.
That’s why your new version is slower then the old one.