Hugepages and Transparent Hugepages in Linux

2023-10-28

There are two concepts in Linux's memory management module: hugepages and transparent hugepages (THP).

Memory Pages and TLB

Linux's memory management is based on pages as the fundamental unit. Each process's memory maps its pages to physical memory through virtual address spaces. Each memory page internally corresponds to a contiguous physical memory region.

The TLB (Translation Lookaside Buffer) is a physical unit within the CPU that caches memory page mapping records. The TLB itself is very fast, close to register speed, and faster than L1 CPU cache. However, the TLB has very limited capacity. Taking the AMD Zen+ architecture as an example:

Cache Type L0 L1 L2
Instruction TLB 8 64 512 (1GB pages not allowed)
Data TLB - 64 1532 (1GB pages not allowed)

The classic memory page size in Linux is 4KB. If all TLB-cached pages are 4KB, they can only cover 8720KB. For memory mapping records beyond this range, the system must load additional entries after a TLB cache miss.

8720KB of memory is far from sufficient for today's computing ecosystem.

Hugepages

To improve TLB caching efficiency, operating systems introduced the concept of hugepages, allowing page sizes to increase significantly:

As a result, process memory requires fewer page mapping records.

Disadvantages of Hugepages

Many processes can run on an operating system, and the creation and termination of different processes leads to system memory fragmentation. If a process requests hugepage memory at a moment when the system cannot find a contiguous physical memory region to satisfy the demand, a memory allocation error will occur. This kind of runtime anomaly is something that applications likely haven't been designed to handle explicitly. Error messages at this point will negatively impact user experience.

Transparent Hugepages

To address the issues with hugepages, Linux introduced the concept of transparent hugepages. THP is a kernel mechanism that automatically merges standard 4KB pages into hugepages behind the scenes, without requiring applications to explicitly request hugepage memory. This process is invisible to user processes, hence the term "transparent."

When the system does not have a sufficiently large contiguous physical memory region to fulfill a memory allocation, transparent hugepages can use several smaller memory regions to complete the allocation. Later, when a sufficiently large physical memory region becomes available, the system will automatically migrate the process's memory to that region via the khugepaged kernel thread.

Currently, Linux supports transparent hugepages in two scenarios: anonymous memory mappings and tmpfs/shmem.

Disadvantages of Transparent Hugepages

As can be seen, transparent hugepages themselves have some drawbacks:

Users processes have no way of perceiving these behaviors.

Choosing a Memory Management Approach

Hugepages and transparent hugepages have similar names, but their runtime behaviors are quite different. Therefore, we need to choose different management approaches based on the characteristics of the processes running on the Linux system.

For example, in desktop environments, there is a wide variety of processes actually running, and processes are constantly being created and terminated. This causes memory fragmentation to become significant after the system has been running for a while. To better serve desktop users, transparent hugepages are essential.

In server scenarios, on the other hand, the system's processes are more stable. Memory fragmentation is not as severe. In this case, the advantages of transparent hugepages are difficult to realize, while their disadvantages are amplified. Server processes often require large amounts of memory, and non-contiguous memory regions perform poorly in this regard. Additionally, automatic memory duplication can further degrade process performance.
In the official documentation for MongoDB and Oracle, transparent hugepages are required to be disabled. Even more extremely, MySQL's unofficial TokuDB engine will refuse to start when transparent hugepages are enabled.

Percona has a blog post 7 that tested PostgreSQL 10 performance with transparent hugepages enabled and disabled. The results found that with transparent hugepages enabled, the database's TLB cache hit rate dropped by one or two percentage points, and there was a clear gap in TPS service capacity.

As long as a process requires enough memory, standard hugepages are necessary, because TLB cache hit rate is one of the guarantees of process performance.

Usage

Hugepages

The available number of hugepages in Linux is set via the kernel parameter vm.nr_hugepages.

To view:

$> sudo sysctl -a | grep vm.nr_hugepages
vm.nr_hugepages = 0
...

To modify the available number of hugepages, edit /etc/sysctl.conf:

vm.nr_hugepages = 100

Then execute the following command to apply the parameters:

sysctl -p

Transparent Hugepages

The enabled status of transparent hugepages can be viewed via the file /sys/kernel/mm/transparent_hugepage/enabled. This file can also be modified, but the setting will be reset after a system reboot. If you want to configure it at boot time, you need to append the kernel boot parameter transparent_hugepage=[never|always] in the grub.conf file.

Monitoring

To view the transparent hugepage usage for a given process, you can check the /proc/<pid>/smaps file for that process. This file contains all virtual memory mapping entries for the process. Each entry includes:

What does "mapped at the PMD level" mean?

x86-64 uses a 4-level page table structure to translate virtual addresses to physical addresses:

PGD (Page Global Directory)
  → P4D (Page 4th Level Directory)
    → PUD (Page Upper Directory)
      → PMD (Page Middle Directory)
        → PTE (Page Table Entry) → 4KB page

Normally, the PMD points to a PTE table, which then maps individual 4KB pages. When a hugepage is used, the PMD entry points directly to a contiguous physical memory region (e.g., 2MB), bypassing the PTE level entirely. This is called a "PMD mapping" — it reduces page table traversal depth and improves TLB efficiency. The "PmdMapped" in these fields means the THP is mapped at this PMD level (2MB on x86-64), rather than being composed of multiple 4KB PTE entries.

Additional Information

For more information on using hugepages and transparent hugepages, please refer to the Linux kernel documentation.

References