I have the following nested loop computation:
int aY=a*Y,aX=a*X;
for(int i=0; i<aY; i+=a)
{
for(int j=0; j<aX; j+=a)
{
xInd=i-j+offX;
yInd=i+j+offY;
if ((xInd>=0) && (xInd<X) &&
(yInd>=0) && (yInd<Y) )
{
z=yInd*X+xInd;
//use z
}
}
}
I want to lose the dependency on i,j,xInd and yInd as much as possible. In other words, I want to “traverse” all of the values z receives while running through the loop, but without involving helping variables i,j,xInd and yInd – or at least have a minimal number of computations involved (most importantly to have no multiplications). How can I do that? Other hints to possible ways to make the loop more efficient would be welcome. Thanks!
If we read the question as how to mimimize the number of iterations around the loop, we can take the following approach.
The constraints:
allow use to tighten the bound of the for loop. Expanding
xIndandyIndgives:Fixing
iallows us to rewrite the second loop bounds as:If you know more about the possible values of
offX,offY,a,XandYfurther reductions may be possible.Note that in reality you probably wouldn’t want to blindly apply this type of optimisation without profiling first (it may prevent the compiler from doing this for you e.g. gcc graphite).
Use as index
if the value
z=yInd*X+xIndis being used to index memory, a bigger win is achieved by ensuring that the memory accesses are sequential to ensure good cache behaviour.Currently
yIndchanges for each iteration so poor cache performance will potentially result.A solution to this issue would be to first compute and store all the indicies, then do all the memory operations in a second pass using these indicies.