Before you cringe at the duplicate title, the other question wasn’t suited to what

Question

0

Asked: May 19, 20262026-05-19T15:45:26+00:00 2026-05-19T15:45:26+00:00

Before you cringe at the duplicate title, the other question wasn’t suited to what

0

Before you cringe at the duplicate title, the other question wasn’t suited to what I ask here (IMO). So.

I am really wanting to use virtual functions in my application to make things a hundred times easier (isn’t that what OOP is all about ;)). But I read somewhere they came at a performance cost, seeing nothing but the same old contrived hype of premature optimization, I decided to give it a quick whirl in a small benchmark test using:

CProfiler.cpp

#include "CProfiler.h"

CProfiler::CProfiler(void (*func)(void), unsigned int iterations) {
    gettimeofday(&a, 0);
    for (;iterations > 0; iterations --) {
        func();
    }
    gettimeofday(&b, 0);
    result = (b.tv_sec * (unsigned int)1e6 + b.tv_usec) - (a.tv_sec * (unsigned int)1e6 + a.tv_usec);
};

main.cpp

#include "CProfiler.h"

#include <iostream>

class CC {
  protected:
    int width, height, area;
  };

class VCC {
  protected:
    int width, height, area;
  public:
    virtual void set_area () {}
  };

class CS: public CC {
  public:
    void set_area () { area = width * height; }
  };

class VCS: public VCC {
  public:
    void set_area () {  area = width * height; }
  };

void profileNonVirtual() {
    CS *abc = new CS;
    abc->set_area();
    delete abc;
}

void profileVirtual() {
    VCS *abc = new VCS;
    abc->set_area();
    delete abc;
}

int main() {
    int iterations = 5000;
    CProfiler prf2(&profileNonVirtual, iterations);
    CProfiler prf(&profileVirtual, iterations);

    std::cout << prf.result;
    std::cout << "\n";
    std::cout << prf2.result;

    return 0;
}

At first I only did 100 and 10000 iterations, and the results were worrying: 4ms for non virtualised, and 250ms for the virtualised! I almost went “nooooooo” inside, but then I upped the iterations to around 500,000; to see the results become almost completely identical (maybe 5% slower without optimization flags enabled).

My question is, why was there such a significant change with a low amount of iterations compared to high amount? Was it purely because the virtual functions are hot in cache at that many iterations?

Disclaimer
I understand that my ‘profiling’ code is not perfect, but it, as it has, gives an estimate of things, which is all that matters at this level. Also I am asking these questions to learn, not to solely optimize my application.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-19T15:45:27+00:00

Extending Charles’ answer.

The problem here is that your loop is doing more than just testing the virtual call itself (the memory allocation probably dwarfs the virtual call overhead anyway), so his suggestion is to change the code so that only the virtual call is tested.

Here the benchmark function is template, because template may be inlined while call through function pointers are unlikely to.

template <typename Type>
double benchmark(Type const& t, size_t iterations)
{
  timeval a, b;
  gettimeofday(&a, 0);
  for (;iterations > 0; --iterations) {
    t.getArea();
  }
  gettimeofday(&b, 0);
  return (b.tv_sec * (unsigned int)1e6 + b.tv_usec) -
         (a.tv_sec * (unsigned int)1e6 + a.tv_usec);
}

Classes:

struct Regular
{
  Regular(size_t w, size_t h): _width(w), _height(h) {}

  size_t getArea() const;

  size_t _width;
  size_t _height;
};

// The following line in another translation unit
// to avoid inlining
size_t Regular::getArea() const { return _width * _height; }

struct Base
{
  Base(size_t w, size_t h): _width(w), _height(h) {}

  virtual size_t getArea() const = 0;

  size_t _width;
  size_t _height;
};

struct Derived: Base
{
  Derived(size_t w, size_t h): Base(w, h) {}

  virtual size_t getArea() const;
};

// The following two functions in another translation unit
// to avoid inlining
size_t Derived::getArea() const  { return _width * _height; }

std::auto_ptr<Base> generateDerived()
{
  return std::auto_ptr<Base>(new Derived(3,7));
}

And the measuring:

int main(int argc, char* argv[])
{
  if (argc != 2) {
    std::cerr << "Usage: %prog iterations\n";
    return 1;
  }

  Regular regular(3, 7);
  std::auto_ptr<Base> derived = generateDerived();

  double regTime = benchmark<Regular>(regular, atoi(argv[1]));
  double derTime = benchmark<Base>(*derived, atoi(argv[1]));

  std::cout << "Regular: " << regTime << "\nDerived: " << derTime << "\n";

  return 0;
}

Note: this tests the overhead of a virtual call in comparison to a regular function. The functionality is different (since you do not have runtime dispatch in the second case), but it’s therefore a worst-case overhead.

EDIT:

Results of the run (gcc.3.4.2, -O2, SLES10 quadcore server) note: with the functions definitions in another translation unit, to prevent inlining

> ./test 5000000
Regular: 17041
Derived: 17194

Not really convincing.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Before you cringe at the duplicate title, the other question wasn’t suited to what

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply