Here is a small test I did and the result surprised me: doing the same loop twice was approximately twice as fast as looping once. I am guessing it as because of memory access?
float* A = new float[1000000];
float* B = new float[1000000];
int h,w;
h = w = 1000;
CString txt;
double time1, time2;
time1 = Timer::instance()->getTime();
for(int j = 0; j < h; j++){
for(int i = 0; i < w; i++){
A[i+j*w] = 1;
B[i+j*w] = 1;
}
}
time2 = Timer::instance()->getTime();
txt.Format(_T("Both in same loop = %f"),time2-time1);
AfxMessageBox(txt);
time1 = Timer::instance()->getTime();
for(int j = 0; j < h; j++){
for(int i = 0; i < w; i++){
A[i+j*w] = 1;
}
}
for(int j = 0; j < h; j++){
for(int i = 0; i < w; i++){
B[i+j*w] = 1;
}
}
time2 = Timer::instance()->getTime();
txt.Format(_T("Different loops = %f"),time2-time1);
AfxMessageBox(txt);
It could be CPU cache, but more likely it’s the concurrent memory access. When you access
array1[x], and then immediately after thatarray2[x], those are two very different locations in memory and it’s difficult to optimize. Howeverarray[0],array[1],array[2]etc are all in contiguous memory and much more efficient to access. Intel seems to agree.