2025/01/23
A few days ago, I was working on a project that required me to use Node.js. I was helping a colleague set up the Docker container used to run the project, and we had to choose between using the Alpine or Debian image for my Docker container. I was heavily against using Alpine, and had to explain why to my colleague. I decided to do write this article to share my stance and explain the rationale behind it, and as an exercise in collecting detailed proof of my POV on the matter.
If you don’t know what Alpine is, don’t talk to me. Alpine Linux is a lightweight Linux distribution that is designed with security
and simplicity in mind. It is known for its small size and minimalistic design,
making it a popular choice for containerized applications. Alpine Linux uses
musl
as its standard C library, which is known for its small size and
performance optimizations.
If you don’t know what Debian is, consider a career in carpentry. However, for the sake of
completeness: Debian is a popular Linux distribution that is known for its
stability and large package repository. It is widely used in production
environments and is a popular choice for server deployments. C code compiled on
Debian systems is compiled against glibc
, the GNU C library, which is known
for its feature-richness and compatibility with a wide range of software.
You might’ve read the previous sections and thought, “Who cares about the standard C library used in a Docker container? I’m not a kernel developer, I’m just trying to run my Node.js application.” And you’d be right. For most applications, the choice of standard C library doesn’t matter. However, if you know me at all, you know that I care about the nitty-gritty low-level details of software development. And I won’t let my application run on anything that I ignore the implementation details of.
Node.js is a C++ application at its core, and it interacts with the operating system through system calls and the standard C library. The choice of standard C library can have an impact on the performance of the application, especially when it comes to I/O-bound operations, and thinking that a JIT-compiled language like JavaScript is immune to these low-level details is naive.
Syscalls and I/O operations are the most common ways that a Node.js application
interacts with the operating system. glibc
’s implementations of most of them
(I’m talking malloc
, memcpy
, printf
, etc.) are heavily optimized.
malloc
, for example, is implemented in a way that makes it fast and efficient
for allocating and deallocating memory different than musl
in several aspects.
Why is important? Well, malloc
is our key to the magical heap.
The heap is essentially a list of memory regions that a running program uses to store data on, dynamically. When we access memory on the heap, the operational the speed is OFC slower than when handling the stack because in a stack frame, data (or instructions) is simply pulled/pushed into memory in a sequential, atomic fashion. This creates no need specifically seek out particular memory addresses. On the heap, however, the program must search for an empty memory block that is large enough to store the data. I don’t need to tell you that this is a much slower process than the stack operation, so the way memory allocators optimise the de/allocation of memory blocks is crucial for the performance of the program.
Since the stack is much faster than the heap, then why we would even need the
heap? Because sometimes we need to allocate memory dynamically in the program.
Therefore the heap is designed to cope with our need. The data stored in heap
regions are requested during runtime. This also means when we don’t use
functions like malloc
in the program, heap will not be initiated in the
memory zone.
musl
and glibc
are two different implementations of the standard C library,
and they have different design philosophies.
glibc
has been around for a long time (since 1987, probably older than me and
you) and is known for its feature-richness and compatibility with a wide range
of software. It’s accumulated a ton of triage and consequent optimizations over
the years, and it’s the default C library on most Linux distributions. However,
glibc
is also known for its large size and memory footprint, which can be a
problem for resource-constrained environments.
musl
, on the other hand, is a newer (its first release was in 2011)
implementation of the standard C library
that is designed with simplicity and lightweightness in mind. It is known for its
small size and minimalistic design, and it is optimized for static linking,
which can result in developers being able to ship applications as small, single
binaries with faster startup times, keeping their runtime memory footprint low.
A godsent for embedded systems and IoT devices. musl
is the
default C library on Alpine Linux, and has become a popular choice for containerised
applications because of its minimalistic design philosophy.
However, musl
is also known for its lack of compatibility with
some software that expects glibc
-specific behavior (or maybe glibc
is knowing
for doing sh!t its own way) and for putting simplicity
over tunability, and this has a certain impact on the performance of some
applications compiled against it.
Well, now you know the difference between musl
and glibc
, and you might’ve
guessed that the choice of standard C library is what makes the difference in
performance between Alpine and Debian. But how does it actually affect the
performance of a Node.js application on a practical level? The internet is full
of theories with no practical examples, but you know this blog’s existence is
entirely devoted to clearing the fog of confusion around software development, so
let’s provide some straightforward explanations.
glibc
uses the ptmalloc
(“pthreads malloc”) implementation of malloc
, a
modern derivation of dlmalloc
that is optimized for multi-threaded applications.
It uses sorted lists instead of binary tries in bins for large blocks, plus a
special array of fast bins for blocks of memory. When freed, these blocks are kept marked as used
and put into the appropriate fast bin. Allocation is satisfied using exact fit
from fast bins when possible. Fast bins are emptied under multiple heuristic
conditions. The big difference between ptmalloc
and dlmalloc
, though, is in support for
concurrent allocations. The ptmalloc
allocator maintains multiple
arenas.
Each arena is an independent instance of the allocator protected by a lock.
Upon allocation, the thread invoking the allocator attempts to lock an
arena, starting with the one it used last. If an arena is locked successfully,
it is used to satisfy the allocation request, otherwise a new arena is
allocated. ptmalloc
’s architecture avoids lock-contention as much as possible.
But its strength is (as always in cases like this) also in a much better API,
which makes supporting slabs and array or struct elements super neat via
independent_comalloc
(here),
which speeds up my compiled code by ~20%. Its thread stack size is variable,
depending on resource contraints, but can be set to up to ~10MB, making it
suitable workloads split into a large number of concurrent threads.
musl
uses the dlmalloc
implementation of malloc
, which is optimized for
simplicity and minimalism. It uses a single heap for all threads, and it
doesn’t have the same level of triage as ptmalloc
. It uses a binary trie
structure for small blocks and a sorted, doubly-linked list for large blocks.
It doesn’t support concurrent allocations, with a single lock for all threads,
which can lead to lock contention in multi-threaded applications. Memory
management is differentiated for large and small blocks of memory:
For small (< 256kb) blocks brk
- a kernel-level memory allocation call - is
used, even though there’s a detail: more memory than needed will be requested.
It’s gonna be used for later allocation without diving down to the kernel level.
For free
s, it won’t free the memory directly at the kernel level, but just
mark it as free.
For large (> 256kb) blocks mmap
and munmap
are called, so no more caching
is needed. This is a good thing, because it’s a lot faster than brk
for large
blocks. But it can also be a bad thing, because it’s a lot slower than brk
for
small blocks.
Its thread stack size is fixed at ~128KB, which is a lot smaller than glibc
’s
default. This can be a problem for workloads that require a large number of
concurrent threads, as the stack size can be a limiting factor.
With all that said, it’s clear that the choice of standard C library can have a significant impact on the performance of a Node.js application.
Most web applications do heavy string processing, I/O operations, and/or depend on delegating workloads to native dependencies, all factors which can be affected by the memory management and thread allocation strategies of the standard C library used a few layers below the Node.js runtime.
In general, if you’re running a Node.js application and want to maximize its
runtime performance, expecting a high number of concurrent threads, then
glibc
is the way to go. If you’re running a Node.js application in a
resource-constrained environment and want to minimize its memory footprint and
startup time, then musl
is probably the way to go for you.