Working with a program that uses 16bytes 4v4 one byte matrices :
unsigned char matrix[4][4];
and a few 256bytes 16v16 one byte matrices:
unsigned char bigMatrix[16][16];
Very often due to data manipulation I am forced to loop column wise in the program making cache misses.
Will the performance improve if I use an array instead, i.e.
unsigned char matrix[16]; unsigned char matrix[256];
and access the elements by using some variables to retrieve elements, i.e.
matrix[variableA*variableB + i];
where variableA*variableB+i needs to be recalculated every time I want to access an element.
I only want speed optimization and memory is of no problem. Will this help, as in give some performance hit or loss, or is the difference too small to even care ?
It makes no difference. The data is laid out in the exact same way in either case, and is accessed in the same way too. I’d be surprised if it didn’t generate exactly the same assembly, even.
However, with a 256byte table, you’re unlikely to get cache misses in any case. The CPU’s L1 cache is typically between 32 and 128KB, so I doubt you’re getting many cache misses in any case.