I’m newbie in C++ programming. I’m trying to see the benefits from moving all my MatLab software to C++. I’m doing some finite element stuff, mainly nonlinear, so one of the operations I need to perform massively is the cross product of two vectors. I’ve tested two implementations in Matlab and C++, C++ seems to be much more faster. In C++ two different implementations give different timings. I’m using Intel MKL.
Here is the code:
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <iostream>
#include <mkl.h>
void vprod( double vgr[3], double vg1[3], double vg2[3]);
int main() {
double v1[3]={1.22, 2.65, 3.65}, v2[3]={6.98, 98.159, 54.65}, vr[3];
int LC=1000000;
int i,j,k;
double tiempo=0.0, tinicial;
//------------------------------------------------------------------------
std::cout << "INLINE METHOD: " << std::endl;
tinicial = dsecnd();
for (i=0; i<LC; i++){
vr[0] = v1[1]*v2[2]-v1[2]*v2[1];
vr[1] =-(v1[0]*v2[2]-v1[2]*v2[0]);
vr[2] = v1[0]*v2[1]-v1[1]*v2[0];
};
tiempo = (dsecnd() - tinicial);
std::cout << "Tiempo Total: " << tiempo << std::endl;
std::cout << "Resultado: " << vr[0] << std::endl;
//------------------------------------------------------------------------
//------------------------------------------------------------------------
std::cout << "FUNCTION METHOD: " << std::endl;
tinicial = dsecnd();
for (i=0; i<LC; i++){
vprod (vr,v1,v2);
};
tiempo = (dsecnd() - tinicial);
std::cout << "Tiempo Total: " << tiempo << std::endl;
std::cout << "Resultado: " << vr[0] << std::endl;
//------------------------------------------------------------------------
std::cin.ignore();
return 0;
}
inline void vprod( double vgr[3], double vg1[3], double vg2[3]){
vgr[0] = vg1[1]*vg2[2]-vg1[2]*vg2[1];
vgr[1] =-(vg1[0]*vg2[2]-vg1[2]*vg2[0]);
vgr[2] = vg1[0]*vg2[1]-vg1[1]*vg2[0];
}
My question is: Why the first implementation is 3 times faster than the second? Is this the result of function call overhead? Thanks !!!
EDIT: I’ve modified the code in order to avoid the compiler “guessing” the results for the loop with constant vectors. As @phonetagger showed, the results are very different. I’ve got 28500 microseconds without using the vprod function and 29000 microseconds using the vprod function. This number were obtained using Ox optimization. Changing the optimization doesn’t affect the comparison if the inline keyword is on, although the numbers raise a bit. Also, if the inline keyword is not used (and optimization is off) the timings are 32000 without using the vprod function and 37000 using the function. So the function call overhead may be around 5000 microseconds.
The new code is:
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <iostream>
#include <mkl.h>
//#include <mkl_lapack.h>
void vprod( double *vgr, int ploc, double *vg1, double *vg2);
int main() {
int nv=1000000;
int dim=3*nv;
double *v1, *v2, *vr; // Declare Pointers
int ploc, i;
double tiempo=0.0, tinicial;
v1 = new double [dim]; //Allocate block of memory
v2 = new double [dim];
vr = new double [dim];
// Fill vectors with something
for (i = 0; i < dim; i++) {
v1[i] =1.25 + (double)(i+1);
v2[i] =2.62+ 2*(double)(i+7);
}
//------------------------------------------------------------------------
std::cout << "RUTINA CON CODIGO INLINE: \n" ;
tinicial = dsecnd();
ploc = 0; // ploc points to an intermediate location.
for (i=0; i<nv; i++){
vr[ploc] = v1[ploc+1]*v2[ploc+2]-v1[ploc+2]*v2[ploc+1];
vr[ploc+1] =-(v1[ploc]*v2[ploc+2]-v1[ploc+2]*v2[ploc]);
vr[ploc+2] = v1[ploc]*v2[ploc+1]-v1[ploc+1]*v2[ploc];
ploc +=3;
};
tiempo = (dsecnd() - tinicial);
std::cout << "Tiempo Total: " << tiempo << ".\n";
std::cout << "Resultado: " << vr[0] << ".\n";
delete v1,v2,vr;
v1 = new double [dim]; //Allocate block of memory
v2 = new double [dim];
vr = new double [dim];
//------------------------------------------------------------------------
//------------------------------------------------------------------------
std::cout << "RUTINA LLAMANDO A FUNCION: \n" ;
ploc=0;
tinicial = dsecnd();
for (i=0; i<nv; i++){
vprod ( vr, ploc, v1, v2);
ploc +=3;
};
tiempo = (dsecnd() - tinicial);
std::cout << "Tiempo Total: " << tiempo << ".\n";
std::cout << "Resultado: " << vr[0] << ".\n";
//------------------------------------------------------------------------
std::cin.ignore();
return 0;
}
inline void vprod( double *vgr, int ploc, double *vg1, double *vg2) {
vgr[ploc] = vg1[ploc+1]*vg2[ploc+2]-vg1[ploc+2]*vg2[ploc+1];
vgr[ploc+1] = -(vg1[ploc]*vg2[ploc+2]-vg1[ploc+2]*vg2[ploc]);
vgr[ploc+2] = vg1[ploc]*vg2[ploc+1]-vg1[ploc+1]*vg2[ploc];
}
Martin, you are absolutely right (ref. Martin’s comment… 3rd comment under my 17:57 Oct 5 2012 answer). Yes, it appears that at higher optimization levels, the compiler was allowing itself to realize that it knew the incoming values of your arrays so it could perform the entire computation, loop and all, at compile time, and optimize the loop out entirely.
I re-coded the test code into three separate files (one header & two source files) and broke the computation & loop out into a separate function to keep the compiler from being too smart with its optimizations. Now it can’t optimize the loops into a compile-time computation. Below are my new results. Note that I added another loop (0 to 50) around the original 0 to 1000000 loop, and then divided by 50. I did this for two reasons: It allows us to compare today’s numbers with the previous numbers, and it also averages out irregularities due to processes swapping in the middle of the test. That may not matter to you since I think dsecnd() reports only CPU time of its specific process?
Anyway, here are my new results…….
(And yes, the odd result of “inline keyword, optimization -O1” being faster than -O2 or -O3 is repeatable, as is the oddity of “no inline keyword, optimization -O1”. I didn’t dig into the assembly to see why that might be.)