I posted recently a question about the memory overhead due to virtuality in C++. The answers allow me to understand how vtable and vptr works.
My problem is the following : I work on supercomputers, I have billions of some objects and consequently I have to care about the memory overhead due to virtuality. After some measures, when I use classes with virtual functions, each derived object has its 8-byte vptr. This is not negligible at all.
I wonder if intel icpc or g++ have some configuration/option/parameters, to use “global” vtables and indexes with adjustable precision instead of vptr. Because a such thing would allow me to use 2-bytes index (unsigned short int) instead of 8-bytes vptr for billions of objects (and a good reduction of memory overhead). Is there any way to do that (or something like that) with compilation options ?
Thank you very much.
Unfortunately… not automatically.
But remember than a v-table is nothing but syntactic sugar for runtime polymorphism. If you are willing to re-engineer your code, there are several alternatives.
1) External polymorphism
The idea is that sometimes you only need polymorphism in a transient fashion. That is, for example:
It seems wasteful for
CatorDogto have a virtual pointer embedded in this situation because you know the dynamic type (they are stored by value).External polymorphism is about having pure concrete types and pure interfaces, as well as a simple bridge in the middle to temporarily (or permanently, but it’s not what you want here) adapt a concrete type to an interface.
The bridge is written once and for all:
And you can use it so:
It incurs an overhead of two pointers per item, but only as long as you need polymorphism.
An alternative is to have
AnimalT<T>work with values too (instead of references) and providing aclonemethod, which allows you to chose fully between having a v-pointer or not depending on the situation.In this case, I advise using a simple class:
And then modify the bridge a bit:
This way you choose when you wanted polymorphic storage and when you do not.
2) Hand-made v-tables
(only easily works on closed hierachies)
It is common in C to emulate object orientation by providing one’s own v-table mechanism. Since you appear to know what a v-table is and how the v-pointer works, then you can perfectly work it yourself.
And then provide a global array for the hierarchy anchored in
Foo:Then all you need in your
Fooclass is to hold onto the most derived type:The closed hierarchy is there because of the
FooVTablesarray and theFooVTableIndexenumeration which need be aware of all the types of the hierarchy.The enum index can be bypassed though, and by making the array non constant it is possible to pre-initialize to a larger size and then at init having each derived type registering itself there automatically. Conflicts of indexes are thus detected during this init phase, and it is even possible to have automatic resolution (scanning the array for a free slot).
This may be less convenient, but does provide a way to open the hierarchy. Obviously it’s easier to code before any thread is launched, as we are talking global variables here.
3) Hand-made polymorphism
(only really works for closed hierarchies)
The latter is based on my experience exploring the LLVM/Clang codebase. A compiler has the very same problem that you are faced with: for tens or hundreds of thousands of small items a vpointer per item really increases memory consumption, which is annoying.
Therefore, they took a simple approach:
enumlisting all membersenumeratorto its base upon constructionenumand casting appropriatelyIn code:
The switches are pretty annoying, but they can be more or less automated playing with some macros and type list. LLVM typically use a file like:
and then you do:
Chris Lattner commented that due to how switches are generated (using a table of code offsets) this produced code similar to that of a virtual dispatch, and thus had about the same amount of CPU overhead, but for a lower memory overhead.
Obviously, the one drawback is that
Foo.cppneed to include all of the headers of its derived classes. Which effectively seals the hierarchy.I voluntarily presented the solutions from the most open one to the most closed one. They have various degrees of complexity/flexibility, and it is up to you to choose which one suits you best.
One important thing, in the latter two cases destruction and copies require special care.