Performance Notes: Ruby Memory and GC Behavior Under Load

Performance Notes: Ruby Memory and GC Behavior Under Load

Context

These are field notes from profiling several Ruby applications under sustained traffic — not synthetic benchmarks, but production-like workloads running for hours at a time. The goal was to build a clearer mental model of how Ruby manages memory, when garbage collection becomes a bottleneck, and which tuning parameters actually shift behavior in meaningful ways.

Most of what I found reinforced things the Ruby internals community has said for years. Some of it surprised me. I'm recording both.

Observation 1: Heap Growth Is Not Linear

The first thing that becomes obvious when you watch GC.stat over time is that Ruby's heap doesn't grow in a smooth curve. It grows in steps. The VM allocates heap pages in batches, and those pages stay allocated even after the objects occupying them are collected.

Under sustained load, the pattern looks like this: traffic arrives, objects get allocated, the heap expands to accommodate them, GC runs, most objects are freed — but the heap pages remain. The next wave of traffic fills those pages again. If the next wave is slightly larger, another batch of pages gets allocated. Over hours, the resident set size (RSS) climbs in a staircase pattern.

This is normal. It is not a memory leak. But it looks like one if you're only watching RSS on a dashboard without context.

The key metric from GC.stat is :heap_available_slots versus :heap_live_slots. When the ratio between available and live slots stabilizes, the heap has found its working size. If it never stabilizes, something is holding references it shouldn't be.

Observation 2: The Difference Between Bloat and Leaks

I want to be precise about this because the two get conflated constantly.

Memory bloat is when the process uses more memory than the steady-state workload requires, usually because a burst of traffic or a single expensive request forced heap expansion that never contracts. Ruby's GC will free objects, but it rarely returns pages to the operating system. The memory is available for reuse within the Ruby process, but the OS still sees it as consumed.

A memory leak is when objects accumulate indefinitely because something holds a reference to them — a growing cache without eviction, a class-level array that gets appended to on every request, a closure capturing variables in a long-lived scope. The distinguishing feature is that :heap_live_slots grows without bound.

The diagnostic approach is different for each. For bloat, you're looking at allocation patterns and asking whether you can reduce peak allocation. For leaks, you're looking at ObjectSpace.each_object and asking what's being retained and by whom.

I've found ObjectSpace.count_objects useful as a first pass. Run it at intervals and watch which object types are growing. If T_STRING or T_ARRAY counts climb without ceiling, that's where to dig. For deeper analysis, the objspace standard library extension gives you allocation source locations, which narrows things down fast.

Observation 3: GC Pauses Under Load Are Spiky, Not Gradual

Ruby's garbage collector (in CRuby 3.x) uses a generational, incremental, mark-and-sweep approach. Minor GC runs are fast and frequent. Major GC runs are slower and less frequent. Under light load, you rarely notice either.

Under sustained load — say, 200+ requests per second on a single Puma worker — the pattern changes. Minor GC runs get more frequent because allocation rate is higher. Occasionally, a major GC run coincides with a request, and that request's latency spikes. Not by 10ms. Sometimes by 50–100ms.

The GC.stat field :major_gc_count lets you track how often this happens. The field :gc_time (available via GC::Profiler) gives you cumulative time spent in GC. Dividing that by request count gives you per-request GC overhead, which is the number I actually care about.

In the applications I profiled, per-request GC time ranged from 2ms to 15ms depending on how much allocation each request triggered. The high end was always correlated with requests that built large intermediate data structures — assembling a big JSON response, rendering a complex view with many partials, or running a query that materialized thousands of ActiveRecord objects.

Observation 4: Tuning Parameters That Moved the Needle

I tested several environment variables that influence Ruby's GC behavior. Here's what I observed.

RUBY_GC_HEAP_INIT_SLOTS: Setting this higher (say, 600000) pre-allocates heap slots at boot. This reduces the number of GC runs during the initial warm-up period, which matters if your first few hundred requests would otherwise trigger repeated heap expansion. After warm-up, the effect is negligible. I found it most useful for applications with slow warm-up and aggressive health checks that penalize early latency.

RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO: This controls how aggressively the VM reclaims free slots. Lowering it reduces memory usage but increases GC frequency. Raising it does the opposite. I found the default (0.20) reasonable for most applications. I only changed it when profiling showed clear over- or under-collection.

MALLOC_ARENA_MAX (glibc, not Ruby): This one surprised me. Setting MALLOC_ARENA_MAX=2 on Linux reduced RSS by 15–25% in multi-threaded Puma configurations. The default glibc behavior creates one arena per CPU core, which leads to memory fragmentation across arenas. Reducing the arena count forces more contention but dramatically less fragmentation. For I/O-bound Rails applications, the contention cost was undetectable in my measurements.

RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR: Controls when a major GC is triggered based on the growth of old-generation objects. I experimented with raising this from the default (2.0) to 3.0, which reduced major GC frequency but increased peak memory. The trade-off was only worth it for applications where p99 latency mattered more than memory cost.

Observation 5: Measurement Approach That Worked

The setup I settled on was straightforward:

  1. Add GC.stat logging to a Rack middleware that records stats on every Nth request (I used every 100th).
  2. Export :heap_live_slots, :heap_free_slots, :major_gc_count, :minor_gc_count, and :total_allocated_objects to a time-series store.
  3. Run production-shaped traffic through the application for at least two hours before drawing conclusions.
  4. Compare baseline runs against tuned runs with identical traffic profiles.

Short benchmark runs — under 10 minutes — gave misleading results because the heap hadn't stabilized. This was the single biggest source of false conclusions in my earlier experiments. Ruby's memory behavior at minute 5 is different from its behavior at minute 90.

I also found that derailed_benchmarks (a gem by Richard Schneeman) is useful for catching obvious per-request allocation problems before you get to production profiling. It's not a substitute for load testing, but it catches low-hanging fruit.

What I'd Investigate Next

I haven't yet done systematic testing of Ruby 3.2+'s variable-width allocation changes on memory behavior under load. The theory is that embedding small objects directly in heap slots reduces fragmentation, but I want to measure it under the same sustained-load conditions before making claims.

I'm also interested in how YJIT interacts with memory profiles. JIT-compiled code occupies memory that doesn't show up in GC.stat, which means the RSS picture changes in ways the standard Ruby memory tools don't capture well.

For more on Ruby performance patterns, see the Ruby Performance topic. For broader application-level performance work, the Web Performance for Rails Developers guide covers the full stack from database queries through to CDN configuration.