I am trying to understand which implementation below is “faster”. Assume that one compiles this code with and without the -DVIRTUAL flag.
I assume that compiling without -DVIRTUAL will be faster because:
a] There is no vtable used
b] The compiler might be able to optimize the assembly instructions because it “knows” exactly which call will be made given the various options (there are only a finite number of options).
My question is PURELY related to speed, not pretty code.
a] Am I correct in my analysis above?
b] Will the branch predictor / compiler combination be intelligent enough to optimize for a given branch of the switch statement? See that the “type” is a const int.
c] Are there any other factors that I am missing?
Thanks!
#include <iostream>
class Base
{
public:
Base(int t) : type(t) {}
~Base() {}
const int type;
#ifdef VIRTUAL
virtual void fn1()=0;
#else
void fn2();
#endif
};
class Derived1 : public Base
{
public:
Derived1() : Base(1) { }
~Derived1() {}
void fn1() { std::cout << "in Derived1()" << std::endl; }
};
class Derived2 : public Base
{
public:
Derived2() : Base(2) { }
~Derived2() { }
void fn1() { std::cout << "in Derived2()" << std::endl; }
};
#ifndef VIRTUAL
void Base::fn2()
{
switch(type)
{
case 1:
(static_cast<Derived1* const>(this))->fn1();
break;
case 2:
(static_cast<Derived2* const>(this))->fn1();
break;
default:
break;
};
}
#endif
int main()
{
Base *test = new Derived1();
#ifdef VIRTUAL
test->fn1();
#else
test->fn2();
#endif
return 0;
}
It depends on the platform and the compiler. A
switchstatement can be implemented as a test and branch or a jump table (i.e., an indirect branch). Avirtualfunction is usually implemented as an indirect branch. If your compiler turns theswitchstatement into a jump table, the two approaches differ by one additional dereference. If that is the case and this particular usage happens infrequently enough (or thrashes the cache enough) then you might see a difference due to an extra cache miss.On the other hand, if the
switchstatement is simply a test and branch, you might see a much bigger performance difference on some in-order CPUs that flush the instruction cache on an indirect branch (or require a high latency between setting the destination of an indirect branch and jumping to it).If you are really concerned with the overhead of virtual function dispatch, say, for an inner loop over a heterogenous collection of objects, you might want to reconsider where you perform the dynamic dispatch. It doesn’t have to be per object; it could also be per known groupings of objects with the same type.