The GMP Model of Goroutines

Goroutine is one of the foundational facilities for concurrent programming implemented in the Go language. It serves as an abstraction of sequential computation flows. In form, a Goroutine resembles a Thread, since logic within the same Goroutine executes sequentially. However, Goroutines are more lightweight in their implementation. By default, a Goroutine's stack space is only 2KB and grows elastically only when necessary. The book Concurrency in Go compares the context switching overhead between Linux system threads and Goroutines. A Linux thread context switch takes on average 1.4μs, while a Goroutine takes only about 200ns — roughly 15% of the thread's cost.

The GMP Model

Although Goroutines may look like threads, they are not threads. The smallest executable unit of an operating system is a thread. All methods ultimately rely on threads to execute. Goroutines follow the same principle — they need threads to actually run.

Go's runtime uses the GMP model when executing Goroutines:

  • G — Goroutine
  • M — Machine
  • P — Processor

P refers to the logical processor of the Go runtime. M (Machine) acts as a state machine that abstracts an OS thread and is responsible for scheduling Goroutines. At any given moment, a P binds to one M, but M can bind to different Gs over time.

Goroutine

Unlike ordinary methods or structs, a Goroutine needs to independently execute methods in sequence. Therefore, a Goroutine requires a PC counter and a stack, similar to a thread, to record its execution state. A Goroutine saves its state when suspended and restores it when resuming execution.

When a Machine executes a Goroutine, it essentially updates the PC counter and execution stack inside that Goroutine. However, M cannot execute only one Goroutine indefinitely — otherwise, concurrency would be impossible. The M needs to switch to other Goroutines under appropriate circumstances. To maintain performance, Goroutines cannot be suspended and resumed arbitrarily. When a Goroutine's stack and state are no longer in the L1/L2 cache, context switching requires random memory access to load the corresponding data from L3 cache or main memory. Spending too much time on cache misses wastes CPU cycles. Therefore, choosing the right switch points is critical. In Go's runtime implementation, Goroutines are switched under the following conditions:

  • Blocking occurs, such as a system call, or blocking on a mutex or Channel.
  • Before calling a function, the Goroutine needs to grow its stack. Since memory allocation and stack copying are required at this point, yielding the processor is more reasonable.

Machine

As described above, Goroutine execution uses cooperative scheduling. M maintains an internal local queue to execute Goroutines in order. The local task queue has limited capacity, so more ready-to-run Goroutines are placed in a Global queue. When M's local queue has no runnable Gs, it first tries to fetch tasks from the Global queue. If the Global queue is also empty, it steals tasks from other P's local queues. This mechanism as a whole forms a work-stealing scheduling model.

All operations between suspending one Goroutine and resuming another are handled by a special Goroutine called g0.

A recently suspended Goroutine cannot continue execution and is placed in another queue. Once the suspension condition is lifted, the Goroutine is moved to the ready queue to wait for execution.

Processor

The logical processor of the Go runtime is not exactly the physical CPU of the machine. Its quantity is determined by the runtime's GOMAXPROCS setting.

References