Before you cringe at the duplicate title, the other question wasn’t suited to what I ask here (IMO). So.
I am really wanting to use virtual functions in my application to make things a hundred times easier (isn’t that what OOP is all about ;)). But I read somewhere they came at a performance cost, seeing nothing but the same old contrived hype of premature optimization, I decided to give it a quick whirl in a small benchmark test using:
CProfiler.cpp
#include "CProfiler.h"
CProfiler::CProfiler(void (*func)(void), unsigned int iterations) {
gettimeofday(&a, 0);
for (;iterations > 0; iterations --) {
func();
}
gettimeofday(&b, 0);
result = (b.tv_sec * (unsigned int)1e6 + b.tv_usec) - (a.tv_sec * (unsigned int)1e6 + a.tv_usec);
};
main.cpp
#include "CProfiler.h"
#include <iostream>
class CC {
protected:
int width, height, area;
};
class VCC {
protected:
int width, height, area;
public:
virtual void set_area () {}
};
class CS: public CC {
public:
void set_area () { area = width * height; }
};
class VCS: public VCC {
public:
void set_area () { area = width * height; }
};
void profileNonVirtual() {
CS *abc = new CS;
abc->set_area();
delete abc;
}
void profileVirtual() {
VCS *abc = new VCS;
abc->set_area();
delete abc;
}
int main() {
int iterations = 5000;
CProfiler prf2(&profileNonVirtual, iterations);
CProfiler prf(&profileVirtual, iterations);
std::cout << prf.result;
std::cout << "\n";
std::cout << prf2.result;
return 0;
}
At first I only did 100 and 10000 iterations, and the results were worrying: 4ms for non virtualised, and 250ms for the virtualised! I almost went “nooooooo” inside, but then I upped the iterations to around 500,000; to see the results become almost completely identical (maybe 5% slower without optimization flags enabled).
My question is, why was there such a significant change with a low amount of iterations compared to high amount? Was it purely because the virtual functions are hot in cache at that many iterations?
Disclaimer
I understand that my ‘profiling’ code is not perfect, but it, as it has, gives an estimate of things, which is all that matters at this level. Also I am asking these questions to learn, not to solely optimize my application.
Extending Charles’ answer.
The problem here is that your loop is doing more than just testing the virtual call itself (the memory allocation probably dwarfs the virtual call overhead anyway), so his suggestion is to change the code so that only the virtual call is tested.
Here the benchmark function is template, because template may be inlined while call through function pointers are unlikely to.
Classes:
And the measuring:
Note: this tests the overhead of a virtual call in comparison to a regular function. The functionality is different (since you do not have runtime dispatch in the second case), but it’s therefore a worst-case overhead.
EDIT:
Results of the run (gcc.3.4.2, -O2, SLES10 quadcore server) note: with the functions definitions in another translation unit, to prevent inlining
Not really convincing.