Back to blog
engineering

Taming JVM Memory on JDK 25 — Part 1: The Serial GC Trap in Small Containers

A two-part series on the JVM defaults that quietly hurt you in Kubernetes. Part 1 covers GC ergonomics, the JDK 25 Serial GC behaviour change, and why your small containers are silently picking the wrong collector.

Michael Olsavsky
Michael Olsavsky
Software Engineer
April 21, 202624 min read
Taming JVM Memory on JDK 25 — Part 1: The Serial GC Trap in Small Containers

A two-part series on the JVM defaults that quietly hurt you in Kubernetes. Part 2 will cover the native-memory / glibc arena story.

What ergonomics actually pick for you

When you run a Java service in Kubernetes, you don't usually pick a garbage collector. You set -Xmx, maybe -XX:MaxRAMPercentage, and trust the JVM to do the right thing. It does — for some definition of "right" that was written for a bare-metal server in 2009.

The JVM's default GC selection is decided by ergonomics: at startup it inspects the environment and picks a collector. The rule is simple and, these days, mostly wrong. As of JDK 23 and still true in JDK 25, the JVM will select G1 by default unless the system has less than 1792 MB of available memory, or only a single processor, in which case Serial GC is selected (Inside.java — A Deep Dive into JVM Start-up; Oracle — Garbage Collection Ergonomics; JEP 248: Make G1 the Default Garbage Collector, which established the "server-class machine" cutoff).

If the process has access to ≥ 2 CPUs AND ≥ 1792 MB of memory → G1. Otherwise → Serial GC.

Serial GC is exactly what it sounds like: a single-threaded, stop-the-world collector that pauses every mutator thread to reclaim memory (Oracle — HotSpot Virtual Machine Garbage Collection Tuning Guide, Serial Collector; Inside.java — The Serial Garbage Collector). It was designed for single-core client machines and small embedded workloads. It is excellent at what it does — and wrong for a low-latency HTTP service.

In Kubernetes, it is very easy to land on the wrong side of that threshold without realising it. A pod with a 1.5 GB limit falls below 1792 MB. cpu: "1500m" can round to 1 for ergonomic purposes depending on the container runtime. A dev replica shrunk to save cost silently flips the collector. The app boots, logs nothing alarming, and then some unlucky request blocks the world while Serial GC does a full sweep.

What changed in JDK 25

JDK 25 didn't fix the selection rule — that's coming later (see below). What it fixed is how Serial GC behaves once you're stuck with it.

Historically, Serial GC's out-of-the-box max heap is around 25% of available memory, a heuristic that dates from a pre-container world. The OpenJDK JEP draft 8350152 — Automatic Heap Sizing for the Serial Garbage Collector states this explicitly:

The long standing solution is to set maximum Java heap size to 25% of available memory. However, today many, if not most applications make use of containerization where that assumption no longer holds. The flag MaxRAMPercentage is one alternative, but while it does allow the JVM to define a max heap size that is a fraction of the memory made available to the container, it's just a guess just as the 25% value is just a guess.

In a container, that default meant two bad things at once: you lost ~75% of the memory you paid for because the JVM refused to use it as heap, and full GCs ran every time that artificially small heap filled — not when the container was actually under memory pressure.

JDK 25 actually changes Serial GC's runtime behaviour via JDK-8346920 — "avoid full GCs in some circumstances", summarised in Thomas Schatzl's JDK 25 G1 / Parallel / Serial GC changes. Combined with JDK-8345313 (Serial and Parallel GC no longer trigger spurious OutOfMemoryError due to JNI critical regions) in the Consolidated JDK 25 Release Notes, the concrete effect is that Serial GC now lets the heap grow much closer to the configured maximum before triggering a full collection, instead of firing on the old conservative heuristic.

This can surprise you on upgrade. If you move a workload from JDK 21 to JDK 25 and nothing else changes, you will see higher steady-state heap usage and longer gaps between full GCs. On a dashboard this looks like a memory leak — the heap that used to oscillate around 50–60% now sits at 80–90% until much later in the cycle. It isn't a leak; it's the collector being deliberately lazy because running a full GC when you still have headroom is wasteful. But if your alerting was calibrated against the old shape, it will page you. Expect to retune memory-usage alerts and to widen the headroom between MaxRAMPercentage and the container limit when adopting JDK 25 on Serial or Parallel GC — the collector's tolerance for "close to the limit" is now much higher than the JVM you remember.

This is a real improvement for anyone stuck on Serial GC — fewer full GCs, better memory utilisation, fewer "why did we pause for 800 ms?" incidents. But it's a consolation prize. Serial GC is still single-threaded and stop-the-world. JDK 25 made it a less wasteful one, not a low-latency one.

And this happened to us

We learned all of the above the expensive way. Shortly after rolling out JDK 25 to a set of smaller Spring Boot services, a handful of them started getting OOMKilled in production. Nothing in the application code had changed. The pods had been stable on JDK 21 for months. The only variable was the JDK bump.

The post-mortem landed on three factors stacking into the same minute:

  1. The containers were small enough to land on Serial GC by default. Limits around 1 vCPU / 1.5 Gi memory fell below the 1792 MB / 2-CPU ergonomics cutoff. Nobody had set -XX:+UseG1GC explicitly, so the JVM silently picked Serial. On JDK 21 this was already suboptimal but survivable.
  2. The heap was sized at 80% of the container limit. -XX:MaxRAMPercentage=80.0 had been a harmless default on JDK 21 — Serial GC's conservative heuristic kept the heap well below the cap in practice, so the "wasted" 20% was plenty of headroom for metaspace, code cache, thread stacks, Netty direct buffers, and glibc arena overhead.
  3. Then JDK 25's JDK-8346920 changed when Serial GC triggers a full collection. The new behaviour lets the heap grow much closer to -Xmx before cleaning. Suddenly the heap that used to oscillate around 50–60% was sitting at 85–95% for long stretches. The 20% headroom we thought we had was gone — the heap was genuinely allowed to occupy it now.

Individually, each of those was survivable. Together, they were not. The JVM committed heap into memory the kernel needed for native allocations, RSS bumped against the container limit, and the kernel OOM-killer resolved the disagreement. Because the pods were Burstable (memory requests != limits) and the node was moderately packed, a few of the kills were node-level eviction rather than per-container — the same SIGKILL, but with no corresponding JVM exit trace, which made the root-cause investigation harder than it should have been.

The fix was exactly the prescription in this article, applied in one change:

  • -XX:+UseG1GC forced explicitly, so we were no longer on a collector whose "good" behaviour is "use more memory before GC'ing."
  • -XX:MaxRAMPercentage=70.0 for containers under 4 GB (60% for the tightest ones), restoring real native-memory headroom.
  • Memory pinned at requests == limits, so the overcommit-at-the-node-level class of failure went away.

Zero code changes. The OOMKills stopped within the rollout window. If this sounds like a suspicious amount of our own setup being wrong — yes, it was. That's exactly why the JDK 25 Serial GC change is worth flagging: behaviours you'd calibrated against silently for years can shift under you, and the dashboards that looked healthy on the old JDK can start lying to you on the new one.

The recommendation: don't wait for the default to change

JEP 523 — Make G1 the Default Garbage Collector in All Environments is targeted at a future JDK and removes the 1792 MB / 2-CPU cutoff entirely. The rationale in the JEP itself is the whole argument for doing it yourself today:

G1's maximum throughput is close to that of Serial, G1's maximum latencies have always been better than Serial since it reclaims memory incrementally rather than with full collections, and G1's native memory usage has been reduced to levels comparable to Serial.

This is backed by concrete throughput work in JEP 522 — G1 GC: Improve Throughput by Reducing Synchronization, which targets JDK 26 and further reduces G1's synchronization overhead and write-barrier code size. You don't need to wait for it — today's G1 on JDK 25 is already competitive with Serial on small containers, and JDK 25's own performance work (Inside.java — Performance Improvements in JDK 25) improved G1's Mixed GC region selection and remembered-set memory use (shared G1CardSet across co-evacuated regions), reducing pause-time spikes further.

In short: on modern JDKs, there is no longer a small-container scenario where Serial beats G1 on a metric you care about. Throughput is comparable. Latency is dramatically better. Native overhead is comparable.

Don't wait for the default to change, and don't wait for JDK 26. If you're on JDK 25 today, set the collector explicitly for every containerised service:

-XX:+UseG1GC

Put it in your base image, Helm chart, or whatever template your services inherit from. There is no downside on modern hardware, and it eliminates an entire class of "why is this pod slow" incidents.

Sizing the heap: how much of the container can you actually use?

Once G1 is handling the heap, the next question is how big to make it. The usual answer — "max it out!" — is wrong in a container, because the heap is not the only thing consuming RSS. You need headroom for:

  • Metaspace and compressed class space
  • Thread stacks (~1 MB × thread count)
  • Code cache and JIT-compiled methods
  • Direct buffers (Netty, NIO, anything using ByteBuffer.allocateDirect)
  • GC internal data structures (G1 remembered sets, card tables)
  • Native allocations via JNI and the glibc allocator (see Part 2)

The canonical accounting of these areas is covered in the HotSpot GC Tuning Guide and instrumented by Native Memory Tracking.

If you set -Xmx equal to the container limit, the kernel will OOMKill you the first time anything in that list expands. The -XX:MaxRAMPercentage flag, added in JDK 10 under JDK-8186315 and documented in the java tool reference, lets you express the heap as a fraction of the container limit instead, so the same flags survive resizing.

Pragmatic defaults that hold up well in production:

  • Containers ≥ 4 GB: -XX:MaxRAMPercentage=75.0 is usually safe. You have enough absolute headroom for native overhead.
  • Containers < 4 GB: cap at 70%. Native overhead doesn't scale linearly with container size, so the smaller the container, the larger the percentage that bucket takes.
  • Containers < 2 GB, or anything with heavy Netty / OTel / JNI: drop to 60%. Yes, it feels wasteful. It's cheaper than an OOMKill.

Pair it with InitialRAMPercentage set to the same value — the JVM grabs the heap up front, the pages are touched, and you see the real memory cost immediately rather than discovering it under load three hours later.

-XX:+UseG1GC
-XX:InitialRAMPercentage=70.0
-XX:MaxRAMPercentage=70.0

Don't go too small

There's a temptation, especially in cost-conscious platforms, to shrink JVM containers aggressively — 500m CPU, 768Mi memory, pack more replicas per node. For a Go or Node.js service this is often fine. For a JVM it rarely is, and Microsoft's Containerize your Java Applications for Kubernetes guidance calls this out explicitly:

For any GC other than SerialGC, Microsoft recommends two or more vCPU cores — or at least 2000m for cpu_limit on Kubernetes.

That recommendation isn't arbitrary. Several JVM subsystems scale directly with the reported CPU count, and they mis-size badly at low values:

  • GC selection — below 2000m CPU limit or below 1792 MB memory, ergonomics picks Serial GC (see the top of this article). Even if you override with -XX:+UseG1GC, G1's concurrent threads and remembered-set work still scale with CPU count. A single-vCPU pod has no parallelism to give G1.
  • JIT compiler threads. The number of C1/C2 compiler threads is derived from Runtime.availableProcessors() (HotSpot CICompilerCount ergonomics). Constrained CPU means slower warm-up — the first few minutes of traffic hit the interpreter before hot methods are compiled, and your p99 for that window is brutal.
  • ForkJoinPool and the common pool default to availableProcessors() - 1, which clamps to 1 on a single-CPU container. Anything built on parallel streams, CompletableFuture.supplyAsync, or libraries that use the common pool (lots of them) becomes effectively serial.
  • Netty event loops default to availableProcessors() * 2 (NettyRuntime.availableProcessors). On a 1-CPU pod that's 2 event loops total — for every reactor and every client (Lettuce, WebClient, gRPC, each with its own pool unless you share resources; see Part 2).
  • Per-thread and per-subsystem fixed costs don't shrink. Metaspace, code cache, JIT structures, GC internals, and the glibc arenas from Part 2 don't scale down below some floor. On a 512 MB container the non-heap overhead can be 250–300 MB, leaving very little for actual application memory.

Microsoft's article concludes:

If you don't know how many cores to start with, a good choice is two vCPU cores.

The pragmatic starting point

For most production JVM workloads, the defensible starting point is:

resources:
  requests:
    cpu: "1"
    memory: "2Gi"
  limits:
    cpu: "2"
    memory: "2Gi"

That is: memory request equals memory limit (non-negotiable for JVM services — explained below), CPU request of 1 with a burst ceiling of 2. The memory request pins what Kubernetes guarantees; the CPU range lets the pod burst into spare node capacity while guaranteeing it at least one whole core.

This produces a Burstable pod (because CPU requests != limits), not Guaranteed. That's a deliberate trade-off. Microsoft's article advocates requests == limits on both CPU and memory to reach Guaranteed QoS, and if you're running critical, latency-sensitive services on an always-busy cluster, follow that stricter guidance. For the 80% case — mainstream backend services on shared clusters with some headroom — the 1 → 2 CPU split buys you better bin-packing and burst capacity without giving up the thing that actually matters (memory being pinned). You should still:

  • Size the heap with -XX:MaxRAMPercentage=70.0 against the 2 GiB memory limit (heap ≈ 1.4 GiB, leaving ~600 MiB for native overhead).
  • Set -XX:ActiveProcessorCount=2 if you want the JVM's internal sizing stable against the limit, not subject to cgroup quota throttling behaviour (HotSpot -XX:ActiveProcessorCount, Microsoft guidance above).
  • Scale horizontally, not vertically, once this container shape is saturated. Two 1/2 × 2 GiB replicas beat one 2/4 × 4 GiB replica for availability and rolling updates.

Services that genuinely justify more (large heaps, heavy caches, ML inference, bulk pipelines) should be sized on their own merits. But below 1 CPU request / 2 CPU limit / 2 GiB memory, you are fighting the JVM's design assumptions, and the savings in scheduling efficiency usually evaporate in worse p99 latency and slower cold starts.

How the JVM actually reads your Kubernetes resources

Before we set requests and limits, it's worth knowing what the JVM does with them. Spoiler: it reads limits, and it largely ignores requests. This surprises people who think "the JVM is container-aware" means "the JVM understands Kubernetes." It doesn't. It understands cgroups, and Kubernetes translates your Pod spec into cgroup files in a specific, lossy way.

What Kubernetes writes to cgroups

Kubernetes is just a user of the Linux cgroup interface — the Kubernetes Resource Management docs spell this out, and the detailed mapping lives in the kubelet's CRI / cgroup driver implementation. The short version:

Pod speccgroup v1 filecgroup v2 fileEffect
requests.cpucpu.sharescpu.weightProportional scheduling weight. Not a hard floor or ceiling.
limits.cpucpu.cfs_quota_us + cpu.cfs_period_uscpu.maxHard CFS throttling ceiling.
requests.memory— (scheduler hint only)— (scheduler hint only)Not enforced at the cgroup level.
limits.memorymemory.limit_in_bytesmemory.maxHard kernel-enforced ceiling. OOM-killer fires above it.

The crucial asymmetry for JVM sizing: memory requests aren't written to any cgroup file the JVM can see. They only influence scheduling. If you set requests.memory: 1Gi, limits.memory: 4Gi, the JVM sees 4 GB, not 1 GB.

What the JVM reads

The JVM has been cgroup-aware since JDK 10 via JDK-8146115, with the v2 implementation tracked under JDK-8230305. At startup, HotSpot reads the container's cgroup files once and caches the values. You can inspect what it actually detected with -Xlog:os+container=trace or jcmd <pid> VM.info.

Memory. The JVM reads only the limit (memory.max / memory.limit_in_bytes). It doesn't know or care what the memory request was. That limit is what MaxRAMPercentage is computed against — so the 70% number from earlier is 70% of limits.memory. This is why setting requests.memory lower than limits.memory is a trap for JVM services: the JVM sizes the heap for the limit, the scheduler packs pods based on the request, and you discover under node pressure that there isn't enough physical memory to satisfy all the heaps that got sized optimistically.

CPU. More nuanced, and it has actually changed. Three cgroup values are potentially in play:

  1. cpu.cfs_quota_us / cpu.cfs_period_us (v1) or cpu.max (v2) — derived from limits.cpu.
  2. cpu.shares (v1) or cpu.weight (v2) — derived from requests.cpu.
  3. The host's online CPU count — sysconf(_SC_NPROCESSORS_ONLN).

Historically the JVM computed min(quota, shares, host_cpus), which led to absurd results like "my pod with requests.cpu: 100m thinks it has 0.1 CPUs and runs single-threaded." JDK-8197867 added -XX:+PreferContainerQuotaForCPUCount (default true) so the JVM would prefer the quota (the limit) over the shares (the request), and JDK-8281571 — Do not use CPU Shares to compute active processor count removed CPU shares from the calculation entirely. As of JDK 19+, the JVM derives its active processor count from limits.cpu alone (bounded by the host CPU count), and the old PreferContainerQuotaForCPUCount / UseContainerCpuShares flags are deprecated (deprecation tracked under JDK-8281571).

The practical consequence: on JDK 25, limits.cpu is what the JVM sees. requests.cpu affects scheduler packing but not the JVM's internal sizing. This is the behaviour Red Hat's Java 17 container-awareness write-up documents in detail, and Mike my bytes — Kubernetes CPU limits: when JVM sees more than it should covers the edge cases (fractional limits round up, unset limits fall through to the host CPU count).

Why this matters for heap sizing

Three concrete implications fall out of the above:

  1. If you don't set limits.memory, MaxRAMPercentage is computed against the host's total memory. On a 256 GB node, a 70% MaxRAMPercentage means a 179 GB heap. The JVM will happily commit a heap that dwarfs your actual pod share, and the kernel OOM-killer will resolve the disagreement. Always set limits.memory on JVM pods.
  2. If you don't set limits.cpu, the JVM sees the host CPU count. It sizes GC threads, ForkJoinPool, and the common pool for a 64-core machine when you actually have 2 cores of quota. That mis-sizing is a separate class of pain from the glibc-arena story in Part 2, but it has the same root cause: the JVM trusting a cgroup value that isn't representative.
  3. If limits > requests, the JVM plans for the limit. The heap is sized for limits.memory, the GC for limits.cpu. Kubernetes packs the node based on requests. This is overcommit. For memory, that's unacceptable — pin it. For CPU, it's usually fine and is the point of Burstable scheduling.

Memory: pin it. CPU: let it burst.

A Pod's QoS class in Kubernetes is decided by the relationship between requests and limits (Kubernetes — Pod Quality of Service Classes):

  • requests == limits on CPU and memory for every container → Guaranteed
  • Otherwise, at least one request set → Burstable
  • Nothing set → BestEffort

When a node runs low on memory, the kubelet evicts pods starting with BestEffort, then Burstable, and touches Guaranteed pods last (Kubernetes — Node-pressure Eviction). A pod whose memory isn't pinned can be killed by the kernel's OOM-killer not because your container exceeded its limit, but because the node is under pressure and your pod is a cheap victim. The SIGKILL looks identical in both cases — from inside the container the process just vanishes.

The two halves of this problem deserve different treatment:

Memory — always requests == limits. A JVM holds onto the heap it commits; there is no "return memory to the node under pressure" behaviour the way there is with Go or Node.js. If the memory request is lower than the limit, the scheduler packs based on the smaller number and the node is overcommitted on a resource the JVM will not release. Pin memory at the number you've sized the heap for. No exceptions for prod JVM services.

CPU — let the limit exceed the request. This is fine. CPU is a perfectly reclaimable resource. If a node is under CPU pressure, CFS throttles your container without killing anything. The worst case of cpu.limits > cpu.requests is throttling-induced latency under contention; the worst case of the same shape on memory is the pod dying. There is no equivalent asymmetry to worry about on CPU, so a CPU range is a legitimate and often desirable optimisation: you get scheduling-friendly requests (good bin-packing, room for HPA signal), while the JVM — which reads the limit, per the cgroup section above — still sizes its thread pools for the burst ceiling you actually want it to use.

This is why the practical shape from earlier works:

resources:
  requests:
    cpu: "1"       # scheduler packs for 1 core
    memory: "2Gi"  # pinned — must equal limit
  limits:
    cpu: "2"       # burst ceiling; JVM sees 2 CPUs
    memory: "2Gi"  # pinned

This is Burstable QoS, not Guaranteed. That's a conscious trade. You lose the node-pressure eviction protection that Guaranteed provides, but because memory is pinned, the kernel OOM-killer is not going to target you for being a cheap victim — your memory.max is honoured per container, and a Burstable pod whose memory usage stays within its own request is not preferred over one whose usage has ballooned past its request. In practice, the memory-request-pinning is what buys you safety here, not the full Guaranteed label.

If you're running a latency-critical tier 0 service on a hot cluster, upgrade to full Guaranteed (set CPU request to match the limit) — Microsoft's Containerize your Java Applications for Kubernetes explicitly recommends requests == limits on both CPU and memory for exactly this reason:

If you must limit the CPU, ensure that you apply the same value for both limits and requests in the deployment file. The JVM doesn't dynamically adjust its runtime, such as the GC and other thread pools. The JVM reads the number of processors available only during startup time.

That guidance is correct and worth following when the pod is important enough to not share CPU with anyone. For mainstream production services, the 1 → 2 CPU split is a reasonable default.

The one thing that is never a reasonable default is "set low requests so you can pack more pods onto a node" applied to memory. The JVM is not elastic in the way a Node.js or Go process is — it claims a heap and holds it. Treat the memory limit as the real memory allocation and pay for it honestly.

Putting it together

For a JDK 25 containerised Java service, the safe starting point is:

-XX:+UseG1GC
-XX:InitialRAMPercentage=70.0
-XX:MaxRAMPercentage=70.0
-XX:+ExitOnOutOfMemoryError

Combined with the resource shape above (1 → 2 CPU, 2 GiB memory pinned), you get:

  • G1's incremental, low-pause collection (no surprise full GCs).
  • Heap sized for the container, with real native-memory headroom.
  • Memory pinned (requests == limits), so the kernel doesn't target you under node memory pressure.
  • CPU free to burst into spare node capacity while the JVM sizes its thread pools for the 2-core ceiling.
  • A consistent failure mode — if the JVM genuinely runs out of heap, the process exits cleanly and Kubernetes restarts it (HotSpot options — ExitOnOutOfMemoryError).

These are defaults, not destinations

Everything above is our recommended default for small-to-midsize containerised JVM services — the shape you start with so you stop thinking about GC selection, heap sizing, and pod resources every time a new service ships. The goal is to remove mental overhead and eliminate the most common "why did this pod die?" incidents, not to hand you a globally optimal configuration.

Real GC and memory tuning is an empirical exercise. The collector, heap size, arena count, and pod shape that work best for your service depend on your allocation profile, thread count, GC pause budget, traffic pattern, and what you're sharing the node with. There is no substitute for:

  • Running GC logs (-Xlog:gc*:file=gc.log:time,uptime,level,tags) and analysing them (GCEasy, GCViewer).
  • Comparing -XX:MaxRAMPercentage values under representative load and measuring p50 / p99 latency, not just whether the pod stays up.
  • Load-testing your actual service, not a synthetic benchmark, because library choices (Netty, Lettuce, OTel, JDBC drivers) dominate real-world allocation behaviour.
  • Tuning once, measuring, and re-tuning — not setting flags once and declaring victory.

The defaults in this article are calibrated against the 80% case: a moderately-sized Spring Boot service in a 1 → 2 CPU / 2 GiB pod on JDK 25, with traffic patterns typical of internal microservices. For that case, these settings get you a boring, stable baseline with minimum effort. Anything outside that — large heaps, latency-critical paths, batch workloads, ML inference, message-bus consumers with bursty allocation — deserves its own tuning pass.

And even with all of this dialled in, you can still get OOMKilled while the heap stays flat. That's a native-memory problem — the one almost nobody gets right out of the box. That's Part 2.

References

OpenJDK JEPs and issue tracker

Oracle documentation and release notes

Inside.java (official OpenJDK blog) and JDK engineer write-ups

Kubernetes

Community write-ups on JVM in containers

#jvm#kubernetes#performance#java