Goroutines vs OS Threads. Why Go Runs Millions When Others Can’t
A practical explanation of how goroutines work, how the scheduler manages them, and why they’re so cheap
“Abstractions fail when you stop asking what they’re hiding.”
I remember the first time I wrote go func() {} in a real system. It felt almost too easy, like I was cheating. Spin up a new “thread” with a keyword? No stack configuration, no pthread boilerplate, no tuning? After years of working in languages where concurrency feels like configuring your own kernel module, Go’s model was a breath of fresh air. But it also raised a question: how is this actually working under the hood? How does Go let me create tens of thousands of these things without everything collapsing?
Goroutines are often described as “lightweight threads,” which is true but undersells the clever engineering beneath them. They aren’t just cheaper threads. They’re a fundamentally different abstraction; ones that sit on top of the Go runtime, not the operating system. Once you understand that relationship, you start to see why Go scales the way it does and why goroutines feel so effortless to use.
Goroutines Aren’t Threads, and That’s the Whole Point
A goroutine is just a small state machine managed by the Go runtime. It has a tiny stack (starting at ~2 KB), a pointer to what function it’s running, and some metadata. That’s it. Contrast this with an OS thread, which comes with a large fixed stack, kernel bookkeeping, and expensive scheduling behavior. Creating one goroutine versus one OS thread isn’t a small difference; it’s orders of magnitude.
This is why code like this isn’t reckless:
for i := 0; i < 500_000; i++ {
go func() {
// lightweight task
}()
}The runtime can handle it. The OS absolutely could not.
A Closer Look at OS Threads vs. Goroutines
To really appreciate goroutines, it helps to zoom in on the machinery beneath them: OS threads. A thread created by the operating system is a relatively heavy object. It carries its own stack, often several megabytes, requires a syscall to create, and the OS kernel itself is responsible for scheduling it. That means preemption happens at the kernel level, context switches involve saving and restoring a large register set, and switching between threads can require jumping between kernel and user space. This is powerful but expensive.
Goroutines flip this model by living entirely in user space under Go’s user-mode scheduler. Instead of megabyte stacks, goroutines begin with tiny stacks that grow and shrink as needed. Instead of the kernel deciding when to move work around, the Go runtime multiplexes many goroutines onto a smaller pool of OS threads (the M:N model). Because the runtime controls scheduling, it can make cheaper and more frequent decisions: yielding during syscalls, waking up sleepers, migrating goroutines to maintain balance, and ensuring fairness even when your code is CPU bound. The garbage collector and scheduler cooperate so that parked goroutines do not waste resources and runnable goroutines get quick access to CPU time.
This split, with OS threads as the real physical workers and goroutines as lightweight logical tasks, lets Go scale to hundreds of thousands or even millions of concurrent operations without overwhelming the kernel. It is concurrency as a language feature rather than an OS tax, and it is the main reason Go feels so effortlessly parallel once you embrace it.
The G-M-P Model. Go’s Own Scheduler
One of the most interesting things about goroutines is that the OS isn’t in charge of scheduling them. Go builds its own scheduler in user space using three entities: the goroutine (G), an OS thread (M), and a logical processor (P). A P is essentially a token that lets a goroutine run Go code, and each P owns a run queue of goroutines.





