The Price of Inheritance in C++

Inheritance is a common technique in C++ object-oriented programming, used to achieve code logic layering for better logical cohesion. For example, common logic is placed in the base class, while specialized logic is placed in the derived class.

#include <iostream>

class Base {
public:
    virtual auto hello() -> void {
        std::cout << "Hello, from base" << std::endl;
    }
};

class Derived : public Base {
public:
    auto hello() -> void override {
        std::cout << "Hello, from derived" << std::endl;
    };
};

When using it, if we don't want third parties to know the concrete implementation of Base, we can do this:

Base* b = new Derived();
b->hello();

Or allow direct use of the derived class Derived:

Derived* d = new Derived();
d->hello();

Regardless of which approach is used, we can correctly invoke the Derived class's implementation and get the output:

Hello, from derived

However, the cost behind each approach is different.

Virtual Function Call Overhead

When calling a virtual function through a base class pointer, the compiler cannot determine the actual function implementation at compile time and must query the vtable (virtual function table) at runtime for an indirect call.

Every object with virtual functions stores a vptr (virtual function table pointer) in its first 8 bytes, pointing to the corresponding vtable for that class. The vtable contains a table of function pointers.

The entire table lookup and call process requires two instructions:

movq    (%rdi), %rax     # Read first 8 bytes of object → vptr → vtable address
jmp     *(%rax)          # Fetch function address from vtable[0], indirect jump

Compared to a direct call, a virtual function call adds:

  • 1 extra memory access (reading the vptr)
  • 1 indirect jump (which cannot be well-predicted by the CPU branch predictor)

This is the typical runtime overhead of virtual function calls.

Devirtualization

If the compiler can determine the actual type at compile time, it can skip the vtable lookup step and call the target function directly, or even inline the function body. This is called Devirtualization.

Direct Use of the Derived Class

When calling a virtual function directly through an object (rather than a pointer/reference), the compile-time type is fully determined:

auto DerivedHello() -> void {
    Derived d{};
    d.hello();  // The compiler knows d is Derived, inlines directly
};

Here, after compilation, the body of hello() is fully expanded — no vtable lookup, no function call, just a direct cout output:

DerivedHello:
    movl    $19, %edx                      # String length
    leaq    .LC0(%rip), %rsi               # "Hello, from derived"
    leaq    _ZSt4cout(%rip), %rbx
    movq    %rbx, %rdi
    call    _ZSt16__ostream_insert...      # Direct call to std::cout
    ...

The final Keyword

The final keyword prohibits further inheritance, giving the compiler a definitive guarantee. Even when called through a pointer, full devirtualization is possible:

class FinalDerived final : public Base {
public:
    auto hello() -> void override {
        std::cout << "Hello, from final derived" << std::endl;
    };
};

auto call_via_final(FinalDerived* f) -> void {
    f->hello();  // FinalDerived is final, no subclasses possible, can inline directly
}

Here, after using final inheritance, the compiled output has no vptr read at all:

call_via_final:
    leaq    _ZSt4cout(%rip), %rbp
    movl    $25, %edx
    leaq    .LC2(%rip), %rsi           # "Hello, from final derived" string address
    call    _ZSt16__ostream_insert...  # Direct inline, no vptr read
    ...
Derived* (non-final) FinalDerived* (final)
Can have subclasses Yes (other translation units may define them) No, final prohibits it
vptr read Required Not needed
Branch Yes (compare + fallback) No
Runtime overhead 1 memory access + 1 comparison + 1 branch prediction 0

Parent and Derived Classes in the Same Translation Unit

If the virtual function call and object construction are in the same translation unit, the compiler can see all type information and devirtualize directly. But if they are split into different translation units, the situation changes.

We place the virtual function call in a separate .cpp file that only includes base.h and knows nothing about Derived:

// caller.cpp — only includes base.h, unaware of Derived's existence
#include "caller.h"

auto call_hello(Base* b) -> void {
    b->hello();  // The compiler is forced to go through the vtable
}
// app.cpp — creates Derived, passes it to call_hello
#include "caller.h"
#include "derived.h"

auto main() -> int {
    Derived d{};
    call_hello(&d);
    return 0;
}

When compiling caller.cpp, the compiler only sees the Base class and doesn't know about the Derived class, so it must conservatively generate a vtable indirect call:

call_hello:
    ....
    movq    (%rdi), %rax    # Read vptr
    ...
    jmp     *%rax           # vtable indirect call
    ...

Enabling LTO

LTO (Link-Time Optimization) merges the intermediate representation (IR) of all translation units at the link stage. At this point, the compiler can see the actual types across files and devirtualize:

g++ -O2 -flto caller.cpp app.cpp -o app

With LTO enabled, call_hello is completely inlined and eliminated. The disassembly shows that main directly calls the Derived class's hello function implementation:

00000000000010a0 <main>:
    10a0:       48 83 ec 18             sub    $0x18,%rsp
    10a4:       48 8d 7c 24 08          lea    0x8(%rsp),%rdi
    10a9:       e8 02 01 00 00          call   11b0 <_ZN7Derived5helloEv>
    10ae:       31 c0                   xor    %eax,%eax
    10b0:       48 83 c4 18             add    $0x18,%rsp
    10b4:       c3                      ret
    10b5:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
    10bc:       00 00 00
    10bf:       90                      nop

There is no movq (%rdi), %rax indirect call at all — the vtable lookup is completely eliminated.

Speculative Devirtualization

GCC -O2 enables a compromise strategy by default: Speculative Devirtualization. The compiler first reads the vtable, guesses the most likely target function, and if the guess is correct, calls the inlined method (fast path); if the guess is wrong, it falls back to a vtable indirect call (slow path).

// caller.cpp only knows about Base, so the compiler guesses b most likely points to a Base object
auto call_hello(Base* b) -> void {
    b->hello();
}

Corresponding assembly output:

call_hello:
    movq    (%rdi), %rax                      # 1. Read vptr
    leaq    _ZN4Base5helloEv(%rip), %rdx      # 2. Get Base::hello address (compile-time guess)
    movq    (%rax), %rax                      # 3. Read vtable[0]
    cmpq    %rdx, %rax                        # 4. Compare
    jne     .L11                              # 5. Wrong guess → fallback

    ...  Inline Base::hello (fast path) ...

.L11:
    jmp     *%rax                             # 6. Fallback: vtable indirect call

This is an "optimistic strategy": trading 1 comparison + 1 conditional branch for the complete elimination of the indirect jump when the guess hits. If the hit rate is high, this implementation is worthwhile.

Impact of Virtual Functions on Object Memory Layout

Empty Class

C++ mandates that every object must have a unique address, so even a class with no members at all occupies at least 1 byte:

class EmptyBase {};

Use clang++ -Xclang -fdump-record-layouts to inspect the object memory layout:

*** Dumping AST Record Layout
         0 | class EmptyBase (empty)
           | [sizeof=1, dsize=1, align=1,
           |  nvsize=1, nvalign=1]

Impact of Virtual Functions

Once a class contains virtual functions, each object gains an additional vptr (8 bytes) used to locate the vtable at runtime:

class Base {
public:
    virtual auto hello() -> void { ... }
};

class EmptyDerived : public Base {};

class Derived : public Base {
public:
    auto hello() -> void override { ... }
};

Use clang++ -Xclang -fdump-record-layouts to inspect the object memory layout:

*** Dumping AST Record Layout
         0 | class Base
         0 |   (Base vtable pointer)
           | [sizeof=8, dsize=8, align=8,
           |  nvsize=8, nvalign=8]

*** Dumping AST Record Layout
         0 | class Derived
         0 |   class Base (primary base)
         0 |     (Base vtable pointer)
           | [sizeof=8, dsize=8, align=8,
           |  nvsize=8, nvalign=8]

*** Dumping AST Record Layout
         0 | class EmptyDerived
         0 |   class Base (primary base)
         0 |     (Base vtable pointer)
           | [sizeof=8, dsize=8, align=8,
           |  nvsize=8, nvalign=8]

If the derived class has no additional data members, it also occupies only 8 bytes:

Class size Composition
Base 8 vptr
Derived 8 vptr (no data members)
EmptyDerived 8 vptr (no override, vtable points to Base::hello)

This can also be verified from the disassembly. When compiled with -O0, you can clearly see the constructor writing the vtable address to the object's base address:

lea    vtable_Derived+16, %rax   # Get vtable address (skip RTTI header)
mov    %rax, -0x8(%rbp)          # Write to object's offset 0 ← vptr
lea    -0x8(%rbp), %rax          # rax = &d
mov    %rax, %rdi                # this = &d

EBO (Empty Base Optimization)

Empty Base Optimization (EBO) means that when a class inherits from an empty base class and has no virtual functions of its own, the compiler does not allocate extra space for the empty base class subobject.

class Empty : public EmptyBase {  // No extra space consumed
public:
    auto hello() -> void override { ... };
};

class EBO : public EmptyBase {    // Occupies size of int for member x
private:
    int x{1};
public:
    auto hello() -> void override { ... };
};

Use clang++ -Xclang -fdump-record-layouts to inspect the object memory layout:

*** Dumping AST Record Layout
         0 | class Empty (empty)
         0 |   class EmptyBase (base) (empty)
           | [sizeof=1, dsize=0, align=1,  ← Empty base class optimized away, still 1 byte overall
           |  nvsize=1, nvalign=1]

*** Dumping AST Record Layout
         0 | class EBO
         0 |   class EmptyBase (base) (empty)
         0 |   int x
           | [sizeof=4, dsize=4, align=4,  ← Only int x occupies 4 bytes, empty base class takes no extra space
           |  nvsize=4, nvalign=4]

However, if the base class has virtual functions, the base class subobject contains a vptr (8 bytes) and is no longer truly "empty". So EBO cannot take effect.

Summary

This article demonstrated the impact of class inheritance on virtual function calls and object memory layout in various scenarios by analyzing assembly output and clang++ dump results.

To reduce the overhead of virtual function calls, we can choose to:

  • Mark classes as final
  • Manage code translation units sensibly
  • Enable LTO compilation
  • Leverage speculative devirtualization

To reduce the space overhead brought by inheritance, we can consider adjusting the logic to ensure the base class has: no virtual methods, no data members.

The compiled instructions are generated with g++ 14.2. The record layouts are dumped with clang++ 20.1.