The GMP Model of Goroutines

Goroutine is one of the foundational facilities for concurrent programming implemented in the Go language. It serves as an abstraction of sequential computation flows. In form, a Goroutine resembles a Thread, since logic within the same Goroutine executes sequentially. However, Goroutines are more lightweight in their implementation. By default, a Goroutine's stack space is only 2KB and grows elastically only when necessary. The book Concurrency in Go compares the context switching overhead between Linux system threads and Goroutines. A Linux thread context switch takes on average 1.4μs, while a Goroutine takes only about 200ns — roughly 15% of the thread's cost.

The GMP Model

Although Goroutines may look like threads, they are not threads. The smallest executable unit of an operating system is a thread. All methods ultimately rely on threads to execute. Goroutines follow the same principle — they need threads to actually run.

Go's runtime uses the GMP model when executing Goroutines:

  • G — Goroutine
  • M — Machine
  • P — Processor

P refers to the logical processor of the Go runtime. M (Machine) acts as a state machine that abstracts an OS thread and is responsible for scheduling Goroutines. At any given moment, each P is bound to at most one M, and an M must be bound to a P to execute user Goroutines. The binding between P and M is dynamic — for example, when an M blocks in a system call, its P may be detached and reattached to another M to keep processing.

Goroutine

Unlike ordinary methods or structs, a Goroutine needs to independently execute methods in sequence. Therefore, a Goroutine requires a PC counter and a stack, similar to a thread, to record its execution state. A Goroutine saves its state when suspended and restores it when resuming execution.

When a Machine executes a Goroutine, it essentially updates the PC counter and execution stack inside that Goroutine. However, M cannot execute only one Goroutine indefinitely — otherwise, concurrency would be impossible. The M needs to switch to other Goroutines under appropriate circumstances. To maintain performance, Goroutines cannot be suspended and resumed arbitrarily. If a Goroutine's stack and state are no longer in the L1/L2 cache, context switching requires random memory access to load the corresponding data from L3 cache or main memory. Spending too much time on cache misses wastes CPU cycles. Therefore, choosing the right switch points is critical. In Go's runtime implementation, Goroutines are switched when blocking occurs, such as a system call, or blocking on a mutex or Channel.

Note that stack growth (detected at function entry via stack overflow check) is not a scheduling point — the runtime allocates a new stack segment and copies the old stack to it, then the same Goroutine continues execution.

Machine

As described above, Goroutine execution uses cooperative scheduling. Each P maintains a local run queue (runq) of Goroutines. The M bound to that P fetches Goroutines from this local queue for execution. The local task queue has limited capacity, so more ready-to-run Goroutines are placed in a Global queue. When a P's local queue has no runnable Gs, the M bound to it first tries to fetch tasks from the Global queue. If the Global queue is also empty, that M steals tasks from other P's local queues. In addition, the scheduler periodically checks the global queue (once every 61 scheduling iterations) even when the local queue is not empty, to prevent starvation of Goroutines in the global queue. This mechanism as a whole forms a work-stealing scheduling model.

The scheduler logic and stack switching during Goroutine suspension and resumption are executed on a special system-stack Goroutine called g0 bound to each M.

When a Goroutine blocks (e.g., on a channel operation, mutex, or system call), it is removed from the run queue and placed into a corresponding wait queue (such as a channel's recvq/sendq or a mutex's wait list). Once the blocking condition is resolved, the Goroutine is moved back to the appropriate run queue (either the global queue or a P's local queue) to be rescheduled.

Processor

P (Processor) represents a virtual processor that holds a local run queue of Goroutines and maintains the scheduling context. An M must be bound to a P to execute user Goroutines. The number of Ps determines the maximum parallelism of a Go program, which can be set via GOMAXPROCS.