I have a class with two virtual member functions: foo and wrapper. foo is short and fast, and wrapper contains a loop that calls foo many times. My hope is that there is some way to inline the calls to foo inside the wrapper function, even when called from a pointer to an object:
MyClass *obj = getObject();
obj->foo(); // As I understand it, this cannot be inlined. That's okay.
obj->wrapper(); // Nor will this. However, I hope that the machine code
// for the wrapper function will contain inlined calls to
// foo().
Essentially, I want the compiler to generate multiple versions of the wrapper function — one for each possible class — and inline calls to the appropriate foo, which should be possible since the object type is determined before picking which wrapper function to execute. Is this possible? Do any compilers support this optimization?
Edit: I appreciate all of the feedback and answers so far, and I may end up picking one of them. However, most responses ignore the last part of my question where I explain why I think this optimization should be feasible. That is really the crux of my question and I am still hoping someone can address that.
Edit 2: I picked Vlad’s answer since he both suggested the popular workaround and partially addressed my proposed optimization (in the comments of David’s answer). Thanks to everyone who wrote an answer — I read them all and there wasn’t a clear “winner”.
Also, I found an academic paper that proposes an optimization very similar to what I was suggesting: http://www.ebb.org/bkuhn/articles/cpp-opt.pdf.
In certain cases, compiler can determine the virtual dispatch behavior in compile-time and perform non-virtual function invocation or even inline the function. It can only do that if it can figure out that your class is the “top” in inheritance chain or those two functions are not otherwise overloaded. Oftentimes, this is simply impossible, especially if you don’t have late time optimization enabled for the whole program.
Unless you want to check the results of your compiler’s optimizations, your best bet would be not to use a virtual function in the inner loop at all. For example, something like this:
But in that case you clearly give up the idea that somebody may come in, inherit from your class and throw in their own implementation of “foo” to be called by your “bar”. They will essentially will need to re-implement both.
On the other hand, it smells a bit like a premature optimization. Modern CPUs will most likely “lock” your loop, predict the exit from it and execute the same µOPs over and over, even if your method is virtually virtual. So I’d recommend you carefully determine this to be a bottleneck before spending your time optimizing it.