Is it possible to use any loop optimization technique here to reduce the execution time ? I need the nested loop with i and j as I need those combinations of (i,j).
EDIT: even if I leave the “actual” code, with this trivial assignment, this is taking up ~5s on my Dual Core box, whereas with that actual code, it takes up ~6s. I experimented with replacing fn_val+=0 by j+=0, and it takes ~1.73s. What could be this due to?
# include <stdio.h>
# include <time.h>
int main(int argc, char **argv)
{
float fn_value=0.0;
int n=10,i,j;
unsigned int k;
clock_t start, end;
start = clock();
for(k=0;k<9765625;k++)
{
for(i=0;i<n;i++)
{
for(j=i;j<n;j++)
// substitute for an "actual" piece of code
fn_value+=0;
}
}
end= clock();
printf("Time taken %lf", (double) (end-start) / CLOCKS_PER_SEC);
return 0;
}
You could do loop unrolling. Actualy, you could just specify an argument to your compiler to unroll all those loops (the actual arguments depend on your compiler).
I don’t know what you’re “actual code” is to be able give you more information. One thing you want to optimize your cache access if you are doing something non-trivial.
Also, are you compiling with optimization? (i.e. -O3 in gcc)
Per your edit:
The reason “j+=0” is faster than “fn_val += 0” is because integer arithemtic is MUCH faster than floating point operations.
This is why we need the actual code to give you informed optimizations.