The first time a high-traffic app falls over, it rarely looks dramatic. Pages still load, just a little slower. Then your database CPU climbs, your p95 latency doubles, and one innocent cache expiration turns into a convoy of identical requests hammering the same origin. Suddenly, the problem is not speed. It is survival.
Caching, in plain English, is the practice of storing data closer to the user or closer to your application so you do not have to recompute or refetch it every time. In high-traffic systems, that means browsers keep static assets, CDNs absorb global read traffic, and in-memory stores like Redis or Memcached shield databases from repeated hot reads. Done well, caching cuts latency and infrastructure cost. Done badly, it creates stale data, brittle invalidation rules, and some very expensive incidents.
We pulled together guidance from protocol standards, platform docs, and engineering write-ups because the useful advice here is not “add Redis.” It is knowing which layer should cache what, how stale your data is allowed to be, and what happens when a million requests all miss at once. That is where architecture earns its keep.
What the protocol people and operators are really saying
Mark Nottingham, IETF, framed stale-while-revalidate around a simple idea: a cache may keep serving a stale response for a bounded window while revalidation happens in the background, instead of making the user wait on the origin. The practical implication is bigger than the header name suggests. You are turning freshness into a budgeted trade-off rather than an all-or-nothing decision.
Jeff Posnick, Google Chrome team, explained the same pattern from the browser side: pair max-age with stale-while-revalidate, and you get three states instead of two, fresh, briefly stale but still serveable, and truly expired. That extra middle state is gold for high-traffic apps because it reduces blocking fetches during the exact period when traffic tends to spike around expiring objects.
Thibault Meunier, Cloudflare, focused on the operational failure mode teams actually feel in production: cache stampede. His point is blunt. When a hot object expires, many requests can rush the origin at once; locks or request collapsing stop that flood, and probabilistic revalidation can reduce origin pressure without adding a separate locking hop.
Put those three views together and you get a sober rule: the best caching strategy is not the most aggressive one. It is the one that makes freshness explicit, absorbs misses gracefully, and fails soft when the origin is sick.
Match the cache layer to the pain you are trying to remove
Most teams get in trouble because they use one cache to solve every problem. High-traffic apps usually need a layered design, where each cache serves a different purpose. HTTP caches and CDNs are best for public, repeatable responses. Application caches are best for hot objects and expensive queries. Write-path caches are a separate decision, because write latency and consistency matter differently than read latency.
| Layer | Best for | Main risk |
|---|---|---|
| Browser / CDN cache | Static assets, public GETs, edge delivery | Serving stale or incorrectly shared content |
| App-side object cache | Product data, profiles, computed fragments | Stampedes and inconsistent invalidation |
| Write-through cache | Read-heavy systems needing fresher cache | Higher write latency |
| Write-behind cache | Very high write throughput | Data loss or lag on async persistence |
That split is not theoretical. Shared caches like CDNs behave differently from private browser caches, and personalized content must not be treated like public content. Cache-aside remains the common database shielding pattern, while write-through is synchronous and write-behind is asynchronous, faster for writes but riskier for durability.
Build your caching strategy from the outside in
Start at the edge. If a response can be shared safely, cache it in the browser and CDN before you touch Redis. Cache-Control gives you the basic levers: max-age for freshness, s-maxage for shared caches, private for user-specific content, and no-store when nothing should be retained. stale-while-revalidate lets shared caches and supporting clients serve an expired object briefly while refreshing it in the background.
Then add an application cache for the data your origin keeps fetching. Cache-aside, also called lazy loading, is still the most common database caching strategy: check cache, fall back to the database on a miss, then populate the cache. It is popular because it only stores data that is actually requested, which keeps memory use efficient and implementation complexity manageable.
After that, decide how writers should behave. For data where reads dominate and small staleness windows are acceptable, cache-aside plus explicit invalidation is often enough. For data where the cache must stay closely aligned with the system of record, write-through updates the cache immediately after the database change. Write-behind improves write performance by accepting asynchronous persistence to the backend store, but it introduces more operational risk.
Here is the mental model that usually works:
- Cache public responses at the edge first.
- Cache hot database reads in memory for a second.
- Use write-through only where freshness matters.
- Use write-behind only where replay and loss are tolerable.
That order keeps you from paying application-cache complexity for traffic your CDN could have absorbed for free.
Prevent stampedes before your origin becomes the bottleneck
A cache hit ratio can look healthy and still hide a dangerous system. Imagine a product detail endpoint handling 20,000 requests per second. At a 95% hit rate, only 1,000 requests per second reach the origin. That sounds fine until a hot key expires and ten or twenty app instances all refill it simultaneously. Now your effective origin load can spike far above the average case, exactly when the object is most popular. The math is simple: 20,000 × 5% = 1,000 misses per second, and stampedes make those misses clump together instead of staying evenly distributed.
This is where single-flight request collapsing matters more than another percentage point of hit rate. When one request wins permission to refresh, and the others either wait briefly or keep serving stale content, the origin stays protected. Some systems use explicit locking. Others use probabilistic revalidation to reduce origin pressure without a central coordination step.
In most application stacks, you do not need an exotic algorithm to benefit. You need three boring safeguards. Add per-key request coalescing so one refill happens at a time. Add TTL jitter so a thousand keys do not expire on the same second. Add stale serving, at the edge or app layer, so a brief origin slowdown does not become user-visible. It also helps to support stale-on-error behavior so cached responses can still be served during temporary upstream failures.
Treat invalidation like a product requirement
Everyone jokes that cache invalidation is one of the hard problems in computer science because it keeps proving true. The mistake is thinking invalidation is a technical detail. It is actually a business rule wearing an infrastructure costume.
If inventory counts must be exact, a five-minute TTL is not a caching strategy, it is a bug. If a news homepage can be thirty seconds behind but must stay up during origin trouble, that is a perfect candidate for stale-while-revalidate. If a user’s account page contains personalized or authenticated data, shared caches should not treat it like public content, and private or no-store may be appropriate depending on sensitivity.
That is why strong teams define freshness classes up front. You might have one class for “must be correct now,” another for “correct within 5 seconds,” and another for “correct within 5 minutes.” Once you do that, the cache policy gets easier. Inventory or balances lean toward direct reads, short-lived cache, or write-through. Catalog pages, recommendations, top charts, and CMS fragments usually tolerate cache-aside with edge caching and stale serving.
Netflix’s EVCache story is useful here because it normalizes a truth many teams resist: not all replicated cache data needs immediate global consistency. Their global cache design accepts eventual consistency in exchange for performance and resilience, and separates local serving from cross-region replication concerns. That is a good reminder that “fresh enough” is often the winning design for large-scale read paths.
Measure the cache like it were part of your database
A cache is not successful because Redis is up. It is successful when it removes expensive work without creating correctness surprises. That means the dashboard needs to go past hit ratio.
Track hit ratio by endpoint and by key family. A single global hit ratio can flatter you. Watch p50 and p95 latency at the cache, app, and database layers so you can see whether the cache is actually shaving meaningful tail latency. Count stampede events, refill concurrency, eviction rate, stale serves, and origin offload percentage. For edge caches, inspect headers and logs to understand whether requests are hitting cache, being revalidated, or punching through to origin.
There is also a practical capacity question. A cache may sustain sub-millisecond performance at very high operation rates in ideal conditions, but your real limit depends on key size, cardinality, eviction policy, and network topology. The metric that matters is not vendor maximum throughput. It is whether your hot set fits, whether eviction is predictable, and whether miss storms stay bounded.
FAQ
Which caching pattern should most teams start with?
For application data, cache-aside is still the default starting point because it is simple, stores only requested data, and works well for read-heavy paths. Add edge caching first for public responses, then use cache-aside behind it for hot origin reads.
When should you use write-through instead of cache-aside?
Use write-through when cache freshness needs to track database updates closely and the extra write latency is acceptable. It is the more consistency-friendly option, but you pay for that with extra coupling on the write path.
Is stale content actually safe to serve?
Yes, if you define where it is acceptable. Bounded stale serving during revalidation, and sometimes during origin errors, is a practical resilience tool. The key is making that tolerance intentional, not accidental.
What is the fastest way to reduce origin load this quarter?
Cache public GET responses at the CDN with sensible Cache-Control, enable stale serving where your product can tolerate it, and add request collapsing around hot-key refreshes. Those three changes often outperform a larger Redis project that ignores the edge.
Honest Takeaway
The uncomfortable truth is that caching is not one decision. It is a stack of decisions about where you are willing to be stale, where you absolutely cannot be wrong, and how much operational complexity you can afford. High-traffic systems win when they make those trade-offs explicit. Browser and CDN caches remove the cheap global traffic. Application caches remove expensive repeated work. Stampede protection keeps misses from becoming incidents. Invalidation policies keep the whole thing honest.
The best strategy is usually less glamorous than people expect. Start with edge caching, add cache-aside for your hottest reads, protect refills with coalescing and jitter, and only move to write-through or write-behind when the business case is obvious. In high-traffic applications, the real superpower is not faster cache hits. It is making cache misses boring.
