When your system starts to stutter under load, garbage collection (GC) is often the silent culprit. It is the hidden tax on throughput that creeps up when object creation gets ahead of cleanup. A few milliseconds of pause might not seem like much, but in a low-latency trading system, an online ad exchange, or a telemetry pipeline handling millions of events per second, those milliseconds multiply into missed opportunities and lost revenue.
At its core, garbage collection is about trade-offs. It is the balance between memory efficiency, pause time, and CPU usage. Tuning it well is as much art as it is engineering. Done right, it turns unpredictable latency into steady throughput. Done poorly, it can bring your system to its knees.
You cannot tune what you do not measure, and you cannot measure without understanding your memory behavior.
Understanding Garbage Collection Mechanisms
Before touching any settings, you need to know which collector you are tuning and what its model of memory looks like.
Generational collectors—used by HotSpot’s G1, Parallel, and ZGC—divide the heap into young and old regions. Short-lived objects are reclaimed frequently, while long-lived ones are promoted. The idea is that most objects die young, so you save time by cleaning only those areas often.
Concurrent collectors like ZGC and Shenandoah aim to reduce pause times by performing much of their work concurrently with the application. They trade CPU overhead for predictability.
Stop-the-world collectors such as the Parallel GC favor raw throughput at the expense of latency. These are often used for batch systems, not interactive ones.
The key difference lies in pause behavior. Throughput-oriented systems can afford minor pauses if they avoid long global ones. Real-time systems must minimize even the shortest stop.
Step 1: Measure Allocation and Promotion Rates
Tuning starts with observation. Use tools like jstat, Async Profiler, or Java Flight Recorder to measure:
-
Allocation rate (objects created per second)
-
Promotion rate (how fast objects move from young to old generation)
-
GC pause frequency and duration
A rule of thumb: if your young generation fills up faster than it can be collected, you need to expand it or reduce short-lived allocations. If objects survive too long and inflate the old generation, promotion pressure will increase pauses.
Pro tip: Run your app with -Xlog:gc*:gc.log and analyze it with GCViewer or GCeasy. Look for spikes in “promotion failed” or “full GC” events—these usually signal that the old generation is too small or that the collector cannot keep up.
Step 2: Select the Right Collector for Your Workload
Each collector has a personality. You cannot optimize for all metrics simultaneously.
| Collector | Strength | Weakness | Typical Use |
|---|---|---|---|
| G1 | Predictable pause times, adaptive sizing | Slightly higher CPU use | General-purpose servers |
| ZGC | Sub-millisecond pauses | More memory overhead | Low-latency apps |
| Shenandoah | Consistent latency, compacting | Needs tuning on high cores | Data pipelines |
| Parallel | High throughput, simple | Long stop-the-world pauses | Batch jobs |
If you are handling event streams or API requests at high QPS, start with G1 or ZGC. Batch analytics may still prefer the Parallel GC. The choice depends on whether latency or total throughput matters more.
Step 3: Control Heap Regions and Survivor Ratios
After picking a collector, shape the heap layout to your workload.
-
Young generation size (
-XX:NewRatioor-XX:NewSize): If your application creates many transient objects, allocate a larger young generation. This reduces promotion churn. -
Survivor ratio (
-XX:SurvivorRatio): Controls how many survivor spaces exist relative to eden. Smaller survivor spaces promote objects sooner, increasing old-gen pressure. -
Max heap size (
-Xmx): Larger heaps reduce GC frequency but increase scan time per cycle. Always leave at least 20–30% headroom for the collector to breathe.
One useful technique is to plot object lifetime histograms from heap dumps. If most allocations die within seconds, invest in the young generation. If many live for minutes or more, focus on tuning the old generation collector.
Step 4: Limit Pause Time Goals Intelligently
Collectors like G1 and ZGC allow you to set pause targets (-XX:MaxGCPauseMillis). This flag is not a guarantee but a hint. Unrealistic targets (for example, 5 ms in a system with a 64-GB heap) can backfire, causing the GC to work harder and burn CPU.
Start with a target consistent with your latency budget. For example, if your 99th percentile service time target is 200 ms, a GC pause target around 50–75 ms gives enough room for other components.
Monitor whether GC is meeting its target. If not, reduce allocation rate or add more heap rather than tightening the target.
Step 5: Profile Allocation Hotspots
No tuning can save you if the application is generating garbage uncontrollably. Use allocation profiling tools to identify hotspots.
Common culprits include:
-
Repeated creation of temporary collections or strings
-
Boxing of primitives in tight loops
-
Excessive logging under load
-
Inefficient JSON serialization
Refactoring these to use object pools, preallocated buffers, or lightweight data structures can cut GC work by half. Mikael Sørensen’s team achieved a 40% throughput gain simply by reusing request buffers instead of recreating them on every call.
Step 6: Verify with Load Testing
After each tuning round, test under production-like conditions. Tools such as Gatling, wrk, or JMeter can simulate real traffic. Track:
-
Average and 99th percentile latency
-
GC pause histogram
-
CPU utilization
-
Allocation rate and heap usage over time
Your goal is not zero GC activity—it is predictable GC activity. A system with regular, small pauses is healthier than one that runs smoothly for an hour and then halts for five seconds.
FAQs
Q: Should I always use ZGC for high throughput?
Not necessarily. ZGC’s low pause times help latency but come at a CPU cost. For CPU-bound services, G1 may achieve higher throughput.
Q: How big should my heap be?
Big enough that your old generation rarely fills, but small enough to avoid excessive GC scanning. Start at 60–70% of physical memory and adjust from there.
Q: Can tuning replace profiling?
No. GC tuning optimizes cleanup, not waste. If your code allocates too much, no GC flag will fix that.
Q: What about containerized environments?
Always set explicit -Xmx and -Xms limits. Containers report limited memory, and without these flags, the JVM may miscalculate available space.
Honest Takeaway
Tuning garbage collection is a journey, not a one-time configuration. Every application’s allocation pattern evolves as features change and data grows. The best engineers treat GC tuning as part of continuous performance management, not as a rescue mission after latency spikes.
Real gains come from pairing thoughtful instrumentation with controlled experiments. Measure, adjust, and repeat. When your GC logs show steady rhythm and your throughput curve flattens into predictability, you will know you have found equilibrium—the kind of quiet stability that high-throughput systems live and die by.

