The Price of Inheritance in C++
2026-06-25
Inheritance is a common technique in C++ object-oriented programming, used to achieve code logic layering for better logical cohesion. For example, common logic is placed in the base class, while specialized logic is placed in the derived class.
#include <iostream>
class Base {
public:
virtual auto hello() -> void {
std::cout << "Hello, from base" << std::endl;
}
};
class Derived : public Base {
public:
auto hello() -> void override {
std::cout << "Hello, from derived" << std::endl;
};
};
When using it, if we don't want third parties to know the concrete implementation of Base, we can do this:
Base* b = new Derived();
b->hello();
Or allow direct use of the derived class Derived:
Derived* d = new Derived();
d->hello();
Regardless of which approach is used, we can correctly invoke the Derived class's implementation and get the output:
Hello, from derived
However, the cost behind each approach is different.
Virtual Function Call Overhead
When calling a virtual function through a base class pointer, the compiler cannot determine the actual function implementation at compile time and must query the vtable (virtual function table) at runtime for an indirect call.
Every object with virtual functions stores a vptr (virtual function table pointer) in its first 8 bytes, pointing to the corresponding vtable for that class. The vtable contains a table of function pointers.
The entire table lookup and call process requires two instructions:
movq (%rdi), %rax # Read first 8 bytes of object → vptr → vtable address
jmp *(%rax) # Fetch function address from vtable[0], indirect jump
Compared to a direct call, a virtual function call adds:
- 1 extra memory access (reading the
vptr) - 1 indirect jump (which cannot be well-predicted by the CPU branch predictor)
This is the typical runtime overhead of virtual function calls.
Devirtualization
If the compiler can determine the actual type at compile time, it can skip the vtable lookup step and call the target function directly, or even inline the function body. This is called Devirtualization.
Direct Use of the Derived Class
When calling a virtual function directly through an object (rather than a pointer/reference), the compile-time type is fully determined:
auto DerivedHello() -> void {
Derived d{};
d.hello(); // The compiler knows d is Derived, inlines directly
};
Here, after compilation, the body of hello() is fully expanded — no vtable lookup, no function call, just a direct cout output:
DerivedHello:
movl $19, %edx # String length
leaq .LC0(%rip), %rsi # "Hello, from derived"
leaq _ZSt4cout(%rip), %rbx
movq %rbx, %rdi
call _ZSt16__ostream_insert... # Direct call to std::cout
...
The final Keyword
The final keyword prohibits further inheritance, giving the compiler a definitive guarantee. Even when called through a pointer, full devirtualization is possible:
class FinalDerived final : public Base {
public:
auto hello() -> void override {
std::cout << "Hello, from final derived" << std::endl;
};
};
auto call_via_final(FinalDerived* f) -> void {
f->hello(); // FinalDerived is final, no subclasses possible, can inline directly
}
Here, after using final inheritance, the compiled output has no vptr read at all:
call_via_final:
leaq _ZSt4cout(%rip), %rbp
movl $25, %edx
leaq .LC2(%rip), %rsi # "Hello, from final derived" string address
call _ZSt16__ostream_insert... # Direct inline, no vptr read
...
Derived* (non-final) |
FinalDerived* (final) |
|
|---|---|---|
| Can have subclasses | Yes (other translation units may define them) | No, final prohibits it |
vptr read |
Required | Not needed |
| Branch | Yes (compare + fallback) | No |
| Runtime overhead | 1 memory access + 1 comparison + 1 branch prediction | 0 |
Parent and Derived Classes in the Same Translation Unit
If the virtual function call and object construction are in the same translation unit, the compiler can see all type information and devirtualize directly. But if they are split into different translation units, the situation changes.
We place the virtual function call in a separate .cpp file that only includes base.h and knows nothing about Derived:
// caller.cpp — only includes base.h, unaware of Derived's existence
#include "caller.h"
auto call_hello(Base* b) -> void {
b->hello(); // The compiler is forced to go through the vtable
}
// app.cpp — creates Derived, passes it to call_hello
#include "caller.h"
#include "derived.h"
auto main() -> int {
Derived d{};
call_hello(&d);
return 0;
}
When compiling caller.cpp, the compiler only sees the Base class and doesn't know about the Derived class, so it must conservatively generate a vtable indirect call:
call_hello:
....
movq (%rdi), %rax # Read vptr
...
jmp *%rax # vtable indirect call
...
Enabling LTO
LTO (Link-Time Optimization) merges the intermediate representation (IR) of all translation units at the link stage. At this point, the compiler can see the actual types across files and devirtualize:
g++ -O2 -flto caller.cpp app.cpp -o app
With LTO enabled, call_hello is completely inlined and eliminated. The disassembly shows that main directly calls the Derived class's hello function implementation:
00000000000010a0 <main>:
10a0: 48 83 ec 18 sub $0x18,%rsp
10a4: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
10a9: e8 02 01 00 00 call 11b0 <_ZN7Derived5helloEv>
10ae: 31 c0 xor %eax,%eax
10b0: 48 83 c4 18 add $0x18,%rsp
10b4: c3 ret
10b5: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
10bc: 00 00 00
10bf: 90 nop
There is no movq (%rdi), %rax indirect call at all — the vtable lookup is completely eliminated.
Speculative Devirtualization
GCC -O2 enables a compromise strategy by default: Speculative Devirtualization. The compiler first reads the vtable, guesses the most likely target function, and if the guess is correct, calls the inlined method (fast path); if the guess is wrong, it falls back to a vtable indirect call (slow path).
// caller.cpp only knows about Base, so the compiler guesses b most likely points to a Base object
auto call_hello(Base* b) -> void {
b->hello();
}
Corresponding assembly output:
call_hello:
movq (%rdi), %rax # 1. Read vptr
leaq _ZN4Base5helloEv(%rip), %rdx # 2. Get Base::hello address (compile-time guess)
movq (%rax), %rax # 3. Read vtable[0]
cmpq %rdx, %rax # 4. Compare
jne .L11 # 5. Wrong guess → fallback
... Inline Base::hello (fast path) ...
.L11:
jmp *%rax # 6. Fallback: vtable indirect call
This is an "optimistic strategy": trading 1 comparison + 1 conditional branch for the complete elimination of the indirect jump when the guess hits. If the hit rate is high, this implementation is worthwhile.
Impact of Virtual Functions on Object Memory Layout
Empty Class
C++ mandates that every object must have a unique address, so even a class with no members at all occupies at least 1 byte:
class EmptyBase {};
Use clang++ -Xclang -fdump-record-layouts to inspect the object memory layout:
*** Dumping AST Record Layout
0 | class EmptyBase (empty)
| [sizeof=1, dsize=1, align=1,
| nvsize=1, nvalign=1]
Impact of Virtual Functions
Once a class contains virtual functions, each object gains an additional vptr (8 bytes) used to locate the vtable at runtime:
class Base {
public:
virtual auto hello() -> void { ... }
};
class EmptyDerived : public Base {};
class Derived : public Base {
public:
auto hello() -> void override { ... }
};
Use clang++ -Xclang -fdump-record-layouts to inspect the object memory layout:
*** Dumping AST Record Layout
0 | class Base
0 | (Base vtable pointer)
| [sizeof=8, dsize=8, align=8,
| nvsize=8, nvalign=8]
*** Dumping AST Record Layout
0 | class Derived
0 | class Base (primary base)
0 | (Base vtable pointer)
| [sizeof=8, dsize=8, align=8,
| nvsize=8, nvalign=8]
*** Dumping AST Record Layout
0 | class EmptyDerived
0 | class Base (primary base)
0 | (Base vtable pointer)
| [sizeof=8, dsize=8, align=8,
| nvsize=8, nvalign=8]
If the derived class has no additional data members, it also occupies only 8 bytes:
| Class | size | Composition |
|---|---|---|
Base |
8 | vptr |
Derived |
8 | vptr (no data members) |
EmptyDerived |
8 | vptr (no override, vtable points to Base::hello) |
This can also be verified from the disassembly. When compiled with -O0, you can clearly see the constructor writing the vtable address to the object's base address:
lea vtable_Derived+16, %rax # Get vtable address (skip RTTI header)
mov %rax, -0x8(%rbp) # Write to object's offset 0 ← vptr
lea -0x8(%rbp), %rax # rax = &d
mov %rax, %rdi # this = &d
EBO (Empty Base Optimization)
Empty Base Optimization (EBO) means that when a class inherits from an empty base class and has no virtual functions of its own, the compiler does not allocate extra space for the empty base class subobject.
class Empty : public EmptyBase { // No extra space consumed
public:
auto hello() -> void override { ... };
};
class EBO : public EmptyBase { // Occupies size of int for member x
private:
int x{1};
public:
auto hello() -> void override { ... };
};
Use clang++ -Xclang -fdump-record-layouts to inspect the object memory layout:
*** Dumping AST Record Layout
0 | class Empty (empty)
0 | class EmptyBase (base) (empty)
| [sizeof=1, dsize=0, align=1, ← Empty base class optimized away, still 1 byte overall
| nvsize=1, nvalign=1]
*** Dumping AST Record Layout
0 | class EBO
0 | class EmptyBase (base) (empty)
0 | int x
| [sizeof=4, dsize=4, align=4, ← Only int x occupies 4 bytes, empty base class takes no extra space
| nvsize=4, nvalign=4]
However, if the base class has virtual functions, the base class subobject contains a vptr (8 bytes) and is no longer truly "empty". So EBO cannot take effect.
Summary
This article demonstrated the impact of class inheritance on virtual function calls and object memory layout in various scenarios by analyzing assembly output and clang++ dump results.
To reduce the overhead of virtual function calls, we can choose to:
- Mark classes as
final - Manage code translation units sensibly
- Enable LTO compilation
- Leverage speculative devirtualization
To reduce the space overhead brought by inheritance, we can consider adjusting the logic to ensure the base class has: no virtual methods, no data members.
The compiled instructions are generated with g++ 14.2. The record layouts are dumped with clang++ 20.1.