Ruby Performance — Measuring What Matters in Production
Ruby performance work is mostly about knowing what to measure and having the discipline not to optimise things that do not matter. The language has become significantly faster over the past several years—YJIT alone delivers 15-30% throughput improvements on typical Rails workloads according to Shopify's production benchmarks—but speed improvements at the language level do not help if your application spends 80% of its time waiting on database queries or external APIs. This topic covers the performance surface that Ruby and Rails developers actually need to understand: memory allocation patterns, garbage collection mechanics, YJIT behavior, profiling tools and the methodology for turning production observations into targeted improvements. For hands-on benchmarking notes, see the performance experiments. The web performance guide covers the full-stack view from the browser's perspective.
Memory allocation: where most Ruby performance problems start
The number one performance lever in most Ruby applications is reducing object allocations. Every object allocated is an object the garbage collector must eventually trace and potentially sweep. High allocation rates cause GC pressure, which causes GC pauses, which cause latency spikes in production.
The typical Rails request allocates thousands of objects. A simple page render can easily create 5,000 to 20,000 objects. Most of these are strings—from template rendering, header construction, URL generation and parameter parsing. Each string allocation costs memory, CPU time for initialization, and future GC work.
Reducing allocations does not mean rewriting your application in a functional style. It means identifying the hot paths—the code that runs on every request—and looking for allocation patterns that can be eliminated without hurting readability. Frozen string literals, memoization, avoiding unnecessary intermediate arrays and using each instead of map when you do not need the result array are all low-effort, high-impact changes.
The memory_profiler gem is the best tool for understanding allocation patterns in Ruby. Running a request through MemoryProfiler.report shows you exactly which lines of code allocate the most objects, broken down by type and source location.
Garbage collection: understanding the generational collector
Ruby uses a generational garbage collector with three generations (since Ruby 2.1) and an incremental marking strategy (since Ruby 2.2). Objects start in the young generation and are promoted to older generations if they survive enough GC cycles.
The practical impact of generational GC is that short-lived objects (which are the majority of allocations in a typical Rails request) are collected cheaply in minor GC runs. Long-lived objects that get promoted to the old generation are traced less frequently but more expensively in major GC runs.
GC tuning in Ruby is done through environment variables: RUBY_GC_HEAP_INIT_SLOTS, RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO, RUBY_GC_HEAP_FREE_SLOTS_MAX_RATIO, RUBY_GC_HEAP_GROWTH_FACTOR and several others. The defaults are reasonable for most applications, but applications with very high allocation rates or large working sets can benefit from tuning.
The most common GC tuning mistake is increasing heap size without understanding why GC is running frequently. If your application allocates heavily on every request, increasing the heap just delays GC without reducing total GC time. The better approach is to reduce allocations on the hot paths first, then tune the heap size to match the application's steady-state working set.
You can observe GC behavior with GC.stat, which returns a hash of counters including total collections, heap slot counts, and major/minor GC run counts. The gc_tracer gem provides time-series data for correlating GC activity with application latency.
YJIT: what it actually does in production
YJIT (Yet Another Just-In-Time compiler) shipped as an experimental feature in Ruby 3.1 and became production-ready in Ruby 3.2. It compiles frequently executed Ruby bytecode into native machine code at runtime, reducing the interpretation overhead for hot methods.
Shopify's production data shows YJIT delivering 15-30% throughput improvements on their Rails monolith. The improvements are most pronounced for CPU-bound workloads: view rendering, serialization, business logic computation. For I/O-bound workloads that spend most time waiting on database or network calls, YJIT's impact is smaller because the bottleneck is not Ruby execution speed.
Enabling YJIT is a one-line change: --yjit flag when starting Ruby, or RUBY_YJIT_ENABLE=1 as an environment variable. There is a memory cost—YJIT's compiled code occupies additional memory, typically 50-100MB for a large Rails application—but the throughput improvement usually justifies it.
YJIT's effectiveness depends on code patterns. Method dispatch polymorphism (calling the same method on objects of many different classes) reduces YJIT's ability to generate efficient code. Monomorphic call sites (same method, same receiver type) are where YJIT shines. This is not something you should optimise for directly, but it explains why some codebases see larger improvements than others.
Monitor YJIT's impact by comparing key metrics—p50 and p99 response times, throughput (requests per second), memory usage—with and without YJIT enabled. Do not rely on synthetic benchmarks; production workloads are the only reliable indicator.
Object shapes: the hidden performance dimension
Ruby 3.2 introduced object shapes, an internal optimization that speeds up instance variable access when objects of the same class have the same set of instance variables initialized in the same order. This is a CRuby implementation detail, not a language feature, but it has practical implications for performance-sensitive code.
When all instances of a class initialize the same instance variables in the same order, CRuby can use a fast path for variable access. When different instances have different shapes (because some code paths set variables conditionally), the runtime falls back to a slower dictionary-style lookup.
The practical guidance: initialize all instance variables in initialize, even if some are set to nil. Avoid setting instance variables conditionally in methods that run frequently. This is good coding practice regardless of the performance implications.
Profiling methodology
The biggest mistake in Ruby performance work is profiling the wrong thing. Micro-benchmarks in isolation tell you about language overhead. Production profiling tells you about your application's actual bottleneck.
A sound profiling methodology follows this order:
- Establish a baseline. Measure current p50, p95 and p99 response times in production. Use application performance monitoring (APM) data if available.
- Identify the bottleneck. Is the slowness in Ruby execution, database queries, external API calls, view rendering, or asset serving? APM tools like Scout, Skylight or custom instrumentation can break down request time by category.
- Profile the hot path. Once you know which category dominates, use targeted profiling. For Ruby CPU time, use
stackprofin CPU mode. For memory, usememory_profiler. For database, useEXPLAIN ANALYZEon slow queries. - Make one change. Optimise the most impactful bottleneck. Deploy. Measure again.
- Repeat. Performance work is iterative. The bottleneck shifts after each improvement.
stackprof is the recommended CPU profiler for production Ruby. It uses sampling to capture call stacks at regular intervals with minimal overhead. The output shows which methods consume the most wall time or CPU time, with call graph context.
For memory profiling, derailed_benchmarks measures memory usage during boot and request handling. It is particularly useful for tracking memory growth across deploys—a gradual increase in boot-time memory often indicates a gem adding more eager-loaded code.
Common performance anti-patterns
- N+1 queries. The classic. Loading a collection and then querying for associated records individually inside a loop. Use
includes,preloadoreager_loadto batch association loading. - String allocation in loops. Building strings with
+or interpolation inside tight loops. UseString#<<for mutable appending, orArray#joinfor collecting parts. - Excessive serialization. Converting large ActiveRecord collections to JSON by loading full objects and then serializing everything. Use
selectto load only needed columns, and serializer libraries that avoid full object instantiation. - Ignoring database indexing. Running table scans on growing tables because the query was fast enough with 1,000 rows. See the PostgreSQL indexing guide for systematic indexing strategy.
- Premature caching. Adding cache layers before understanding the bottleneck. Caching hides performance problems without fixing them and adds cache invalidation complexity.
- Benchmarking in development. Development mode has code reloading, unoptimised queries (no query cache), and no YJIT. Production performance numbers are the only numbers that matter.
Sub-topic map
| Sub-topic | Key concern | Related content |
|---|---|---|
| Memory allocation | Object counts, allocation hot spots | Performance notes |
| Garbage collection | GC tuning, generational behavior | Performance notes |
| YJIT | Throughput improvement, memory trade-off | Performance notes |
| Database performance | Query time, indexing, connection pooling | PostgreSQL Indexing |
| Web performance | Full-stack view, asset budgets, caching | Web Performance |
| Profiling | stackprof, memory_profiler, methodology | Debugging guide |
Frequently asked questions
Is Ruby slow?
Ruby is slower than compiled languages for raw computation, but the bottleneck in most Rails applications is not Ruby execution speed. It is database queries, network I/O and memory allocation patterns. With YJIT enabled, Ruby 3.3+ is fast enough for the vast majority of web applications.
Should I switch to a faster language?
Almost certainly not. Rewriting a working application in a faster language to solve a performance problem you have not profiled is one of the most expensive mistakes in software engineering. Profile first. The bottleneck is rarely where you think it is.
How much does YJIT help?
In Shopify's production data, 15-30% throughput improvement on a large Rails application. Your mileage will vary depending on your workload profile. CPU-bound workloads benefit most. I/O-bound workloads benefit least.
What is the single most impactful performance improvement for a typical Rails app?
Adding proper database indexes to the queries that run most frequently. This usually delivers more improvement than any Ruby-level optimization, because database time dominates total request time in most applications.
Observations and measurements from benchmarking Ruby versions, GC settings and YJIT in controlled conditions. ::