What makes goroutines fast? A quick peek into the GMP model inside the Go scheduler.
Note:
The diagrams shown here are conceptual abstractions.
The actual Go scheduler is more dynamic: P-M bindings are not fixed, work stealing is decentralized, and execution involves additional runtime mechanisms such as the network poller and garbage collector.
The goal here is clarity of the core GMP model, not a cycle accurate runtime trace.
The Go runtime uses the GMP model to schedule and execute goroutines.
Whenever the statement go func() {} is executed, a new goroutine is created.
What is a goroutine? Is it an OS thread?
Not exactly.
A goroutine is a lightweight user-space thread created and managed by the Go runtime. This is the (G) in the GMP model. These goroutines are placed into the Local Run Queue (LRQ) of a logical processor (P).
A P (Processor) is not a CPU core.
It is a runtime construct that holds:
- A local run queue
- Scheduler state
- Per-P memory allocator cache
To execute a goroutine, an OS thread (M) must acquire a P.
An M is a real OS thread created by the Go runtime.
While holding a P, the M executes goroutines.
The Go scheduler also maintains a Global Run Queue (GRQ).
It is used when:
- A local run queue overflows
- Work stealing is required
- Load balancing is needed
Only when an M holds a P can it execute Go code.
M:N Multiplexing
Note: Abstract diagram to visualize the M:N thread multiplexing
Go uses an M:N multiplexing.
This means:
- Many goroutines (G)
- Are multiplexed onto fewer OS threads (M)
- Which are then scheduled by the OS onto CPU cores.
This is what makes goroutines cheap and scalable.
Cheap - because goroutines are managed entirely in user space and do not require creating a new OS thread.
Efficient - because switching between goroutines happens mostly in user space, avoiding expensive kernel-level thread context switches.
Why This Matters
Parallelism in Go is controlled by GOMAXPROCS, which determines how many logical processors (P) exist.
At most, GOMAXPROCS goroutines can execute simultaneously.
The OS still schedules threads onto CPU cores.
Go does not replace the OS scheduler, it works with it.
Two schedulers operate together:
- Go runtime scheduler: G -> M (via P)
- OS scheduler: M -> CPU
This layered design is what allows Go to scale to hundreds of thousands of goroutines without overwhelming the operating system.