I have a program for multiplication of square matrices. It also, I think the program’s performance by the formula (number of operations) / (run time). Why is the growth dimension of the matrix decreases performance? Thanks.
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <sys/time.h>
using namespace std;
double getsec(){
struct timeval t;
gettimeofday(&t,NULL);
return t.tv_sec+t.tv_usec*0.000001;
}
int main(int argc, char* argv[])
{
double begintime=getsec();
int n;
if(argc==2)n=atoi(argv[1]);
else n=3;
int**a=new int*[n];
double**b=new double*[n];
double**c=new double*[n];
for (int i=0;i<n;i++){
a[i]=new int [n];
b[i]=new double [n];
c[i]=new double [n];
}
for (int i=0;i<n;i++)
for(int j=0;j<n;j++){
a[i][j]=i+1;
b[i][j]=1/(j+1.);
c[i][j]=0;
}
for (int i=0;i<n;i++)
for(int j=0;j<n;j++)
for(int k=0;k<n;k++)
c[i][j]+=a[i][k]*b[k][j];
double qty_of_operations = (double)2*n*n*n;
cout<<n<<" c11="<<c[0][0]<<" c1n="<<c[0][n-1]<<" cn1="<<c[n-1][0]<<" cnn="<<c[n-1][n-1]<<" "<<qty_of_operations/(getsec()-begintime)<<endl;
return 0;
}
I think you are asking why the average number of floating-point operations per second (FLOPS) decreases as the matrix size increases.
The answer is: cache. The “naive” approach to matrix multiplication that you are using is terrible for cache performance; as the matrix grows you will be increasing the number of cache misses.
If you’re determined to write this yourself (rather than using an extant linear-algebra library), you should investigate “blocking”, also known as “loop tiling”. See e.g. http://en.wikipedia.org/wiki/Loop_tiling. The basic idea is that you break the operation up into smaller blocks that correspond to your cache size.