If you’ve ever opened a website and thought, “Why did that load instantly?”, chances are caching did most of the work.
Modern web applications rarely serve every request from scratch. Instead, they rely on multiple caching layers working together. These layers store previously generated responses closer to the user, dramatically reducing latency, server load, and infrastructure costs.
At a high level, caching simply means saving a copy of data so it can be served faster later. But in real-world systems, caching isn’t just one thing. It usually happens across three main layers:
- Browser caching
- CDN (edge) caching
- Application-level caching
Understanding how these layers interact is essential if you want to build scalable systems. Many performance problems come not from missing caches, but from misconfigured ones fighting each other.
Let’s break down how each layer works and how they fit together in a modern architecture.
Why Modern Systems Use Multiple Caching Layers
A single cache cannot solve every performance problem. Each layer exists because different parts of the request lifecycle have different bottlenecks.
Consider a typical request:
User → Browser → CDN → Application Server → Database
Every hop introduces latency. Caching works best when data is stored as close to the user as possible.
This creates a hierarchy:
| Layer | Location | Purpose |
|---|---|---|
| Browser cache | On the user’s device | Prevents unnecessary network requests |
| CDN cache | Edge servers near users | Reduces distance to the server |
| Application cache | Backend servers | Avoids expensive database queries |
Think of it like a grocery supply chain.
-
Browser cache is your refrigerator.
-
CDN cache is the neighborhood store.
-
Application cache is the warehouse.
If something is already in your fridge, you don’t drive to the store. The same logic applies to HTTP requests.
Browser Caching: The First Line of Defense
The browser cache sits directly on the user’s device. It stores static assets like:
- Images
- CSS files
- JavaScript bundles
- Fonts
- Sometimes HTML pages
When the browser already has a resource locally, it can skip the network entirely.
How Browser Caching Works
Browsers rely primarily on HTTP headers such as:
Expires
ETag
Last-Modified
For example:
This tells the browser it can reuse the resource for one year without checking the server.
Two Common Browser Cache Strategies
1. Strong caching
The browser uses the cached file without contacting the server.
Example:
The file is reused for 24 hours.
2. Validation caching
The browser asks the server if the file changed.
Example:
The browser sends:
If unchanged, the server returns:
This response contains no body, making it much faster than downloading the file again.
CDN Caching: Moving Content Closer to Users
A Content Delivery Network (CDN) sits between users and your origin server. Popular providers include:
- Cloudflare
- Fastly
- AWS CloudFront
- Akamai
CDNs operate edge servers distributed around the world. Instead of every user requesting data from your origin server, requests are served from the nearest edge node.
How CDN Caching Works
When a request reaches the CDN:
- CDN checks if the content exists in the edge cache
- If yes → serve immediately
- If no → request from origin server
- Store response for future requests
This dramatically reduces:
- latency
- origin server load
- bandwidth usage
Example Flow
User in Germany requests:
Without CDN:
With CDN:
Response time can drop from 200–300ms to under 20ms.
CDN Cache Headers
CDNs respect similar headers to browsers:
Surrogate-Control
s-maxage
Example optimized for CDNs:
Meaning:
- Browser cache: 5 minutes
- CDN cache: 1 hour
Application-Level Caching: Avoiding Expensive Backend Work
Even with browser and CDN caching, dynamic requests still reach your backend.
Application caching reduces the cost of generating responses by storing computed results in memory or fast storage.
Common tools include:
- Redis
- Memcached
- in-process caches (like Node LRU cache)
- database query caches
Example Problem
Imagine an API endpoint:
Without caching:
If the query runs 10,000 times per minute, your database becomes the bottleneck.
With Application Cache
If cache misses:
This pattern is called cache-aside.
Example pseudocode:
posts = redis.get(“top_posts”)
if posts is null:
posts = database.query(“SELECT * FROM posts ORDER BY score DESC LIMIT 10”)
redis.set(“top_posts”, posts, ttl=300)
return posts
Now the expensive query runs once every 5 minutes instead of thousands of times.
How These Layers Work Together
The real power comes from combining them.
A request typically flows like this:
↓
Browser Cache
↓
CDN Edge Cache
↓
Application Cache
↓
Database
Most requests never reach the bottom of the stack.
Example distribution in a well-optimized system:
-
60–80% served by browser cache
-
15–30% served by CDN
-
5–10% reach application
-
<1% hit database
This layered architecture is why companies like Netflix, Shopify, and Reddit can handle millions of requests per second.
How to Design a Practical Caching Strategy
Here’s a simple approach used in many production systems.
Step 1: Cache Static Assets Aggressively
Use long TTLs for immutable files.
Example:
Pair this with content hashing:
If the file changes, the name changes.
Step 2: Use CDN Edge Caching for Public Pages
Cache pages that do not depend on user identity.
Good candidates:
- blog posts
- documentation
- marketing pages
CDN edge caching often provides the largest performance gain.
Step 3: Cache Expensive Backend Queries
Use Redis or Memcached for:
- API responses
- expensive database queries
- session data
Focus on endpoints with:
- high request volume
- expensive computation
Step 4: Plan for Cache Invalidation
Caching is easy. Cache invalidation is the hard part.
Common strategies include:
- TTL expiration
- cache busting
- versioned keys
- event-based invalidation
For example:
Changing the version automatically invalidates older cache entries.
Common Caching Mistakes Engineers Make
Even experienced developers run into caching problems.
Here are the big ones.
Caching personalized content
Never cache user-specific responses globally.
Example:
This should bypass CDN caching.
Forgetting cache headers
Without headersCache-Control, neither browsers nor CDNs know what to cache.
Over-caching dynamic data
Real-time data like stock prices or chat messages should not have long TTLs.
Ignoring cache observability
You should monitor:
- cache hit ratio
- latency improvements
- eviction rates
Without this data, you cannot tune caching effectively.
FAQ
What is a cache hit vs cache miss?
A cache hit means the requested data was found in cache.
A cache miss means the system had to fetch the data from the source.
Higher hit ratios mean better performance.
Which caching layer provides the biggest speed boost?
Usually, CDN caching reduces the network distance between users and servers.
Should APIs use CDN caching?
Yes, for public API responses. Many modern APIs cache responses at the edge to reduce backend load.
Is Redis required for application caching?
Not strictly. But Redis is popular because it provides:
-
in-memory speed
-
TTL expiration
-
distributed caching
Honest Takeaway
Caching layers are one of the most powerful performance tools in modern web architecture. When used correctly, they can reduce server load by orders of magnitude and make applications feel instant to users.
But caching is not a magic switch. It requires thoughtful design around TTL strategies, invalidation rules, and observability. The teams that do this well treat caching as part of system architecture, not just an optimization added later.
If you remember one principle, make it this:
Cache as close to the user as possible, and only compute when you absolutely must.

