The code below calculates the correlation matrix given a covariance matrix. How can I write this better? The issue is this section of code will run 1000s of times on matrices whose dimensions are about 100 x 100.
// Copy upper triangle of covariance matrix to correlation matrix
for(i = 0; i < rows; i++){
for(j = i; j < rows; j++){
corrmatrix.array[i * rows + j] = covmatrix.array[i * rows + j];
}
}
// Calculate upper triangle of corr matrix
for(i = 0; i < rows; i++){
root = sqrt(covmatrix.array[(i * rows) + i]);
for(j = 0; j <= i; j++){ // Move down
corrmatrix.array[ j * rows + i ] /= root;
}
k = i * rows;
for(j = i; j < rows; j++){ // Move across
corrmatrix.array[ k + j ] /= root;
}
}
// Copy upper triangle to lower triangle
for(i = 0; i < rows; i++){
k = i * rows;
for(j = i; j < rows; j++){
corrmatrix.array[ (j * rows) + i ] = corrmatrix.array[ k + j ];
}
}
I have made checks that the rows and columns are equal etc, so I am just using rows everywhere. I want to optimize the speed (significantly).
PS:
- Matrices are stored in row-major, dense format
- I am not using packed storage for now.
Thank you
The first thing that jumps out at me is that you’re doing division by the same number in your inner loops.
Don’t do that. Division is slow.
What you should do instead is to multiply by the reciprocal of
rootinstead of dividing by it repeatedly:Although this optimization may seem obvious to a compiler, it may not be allowed to do this optimization due to floating-point strictness. You can try relaxing your floating-point settings with
-ffast-math(in GCC) or something similar.