I am working on a loop like this:
int arrA[BIG], arrB[BIG], arrC[BIG];
for(int = 0; i<BIG; i++){
do_operation(arrA[i], arrB[i], arrC[i]);
}
Here do_operation is not an actual function. It just means some operations between A,B,C.
From the profiling data, it looks like the cache missing is high.
How can I rewrite the loop with better cache behavior?
Thanks for any comment!
You are accessing each array linearly, which is essentially optimal for cache usage (and for the hardware prefetcher).
However, if your arrays are an unfortunate size (usually large powers of two), you will get thrashing;
arrA[i],arrB[i]andarrC[i]will all map to the same cache line, and constantly evict each other. Essentially, every single access will be a cache miss. To avoid this, you should try padding each array slightly.See e.g. Understanding cache thrashing.