I would like to do some transformation to the IplImage using OpenMP. It is simple transformation that turns image upside down. The code with OpenMP runs the same as without. It doesn’t really matter.
void UpsideDownFilter::filter(IplImage* dstImage) {
uchar temp;
int j;
int i;
#pragma omp parallel shared(dstImage) private(j, i, temp)
{
// std::cout << omp_get_thread_num() << std::endl;
#pragma omp for schedule(static, 30) nowait
for(j = 0; j < dstImage->height / 2; ++j) {
for(i = 0; i < dstImage->widthStep; ++i) {
temp = dstImage->imageData[i + j * dstImage->widthStep];
dstImage->imageData[i + j * dstImage->widthStep] =
dstImage->imageData[i + (dstImage->height - 1 - j) *
dstImage->widthStep];
dstImage->imageData[i + (dstImage->height - 1 - j) *
dstImage->widthStep] = temp;
}
}
}
}
I’ve already pushed the #pragma omp for to inner loop. I’ve done all the other magic stuff I usually do when I don’t have a clue what’s wrong (delete this, add that). This is how I call that method from my code:
for (vector<filter_ptr>::iterator it = filters.begin();
it != filters.end(); ++it) {
(*it)->filter(dstImage);
}
Could anybody tell me what I’m doing wrong?
Since I couldn’t compile your code I wrote my own which I think is pretty similar. You have flattened your 2D matrix and I couldn’t be bothered but I don’t think that will affect what I think is going wrong for you.
On a quad core machine I get no speedup as well.
I think the reason why there is no speedup is because the program is memory bound. That is the speed of the program is controlled by the speed of sending data to and from memory. So no matter how many cores you have you can’t go any faster because they are not the limiting factor.