You ship an API that works perfectly in staging. Then production traffic hits. Mobile clients retry because of flaky networks. Users double-click submit. A background task takes 45 seconds instead of five. Support tickets start with a familiar line, “Did it go through?”
This is where asynchronous workflows earn their keep. Instead of forcing a client to hold an HTTP connection open while work completes, you accept the request, start processing in the background, and provide a reliable way to check status or receive the final result later. In plain terms, an asynchronous API acknowledges the request now and delivers the outcome later.
The HTTP building block for this pattern is 202 Accepted, which explicitly means the request has been received but not yet completed. The tricky part is not returning 202. The tricky part is designing everything around it so retries are safe, failures are visible, and clients never have to guess what happened.
What battle-tested teams consistently get right
If you study APIs that operate at massive scale, you see recurring patterns.
Stripe’s engineering team treats idempotency as a core API feature. Their public API supports idempotency keys for write operations, and their design choice is blunt: if a client retries with the same key, the server replays the original response, even if it was an error. That removes ambiguity under network failures and makes retries safe by default.
The HTTP specification and mainstream API guidance are equally clear about 202. When you accept work asynchronously, HTTP will not magically send a second response later. You must provide another mechanism to communicate completion, typically a status endpoint or a callback.
The OpenAPI Specification maintainers formalized callbacks for a reason. Enough real systems need to call clients back out of band that the contract deserves to be documented like any other API surface.
When you synthesize these perspectives, one theme emerges: async APIs are contract design problems disguised as infrastructure problems. Queues and workers are implementation details. The real work is defining a lifecycle that holds up under retries, partial failures, and reordering.
Model the workflow as a resource, not a side effect
The cleanest asynchronous APIs make the work itself addressable.
Here is the baseline pattern that scales:
- The client sends
POST /reports. - The server responds with
202 Acceptedand a stable URL such as/jobs/{job_id}. - The client either polls that URL or waits for a webhook notification.
In this design, the job is a first-class resource with its own lifecycle. That is the key. You are not hiding background work behind a single endpoint. You are exposing a state machine.
A practical shape that ages well looks like this:
POST /reportsreturns 202 andLocation: /jobs/{job_id}GET /jobs/{job_id}returns{ status, progress, result_url, error }- Optionally,
GET /jobs/{job_id}/eventsfor detailed timelines - Keep the state model boring.
queued → running → succeeded | failed | canceledis enough for most systems. Add timestamps for transitions so clients can reason about time.
If you cannot explain the lifecycle in one paragraph, your clients will implement it incorrectly.
Make retries safe with explicit idempotency rules
Asynchronous workflows amplify retries.
Clients retry because timeouts happen. Workers retry because transient errors happen. Message brokers redeliver because that is how reliability works. Without guardrails, you will process the same logical request twice.
For any “start work” endpoint, support an idempotency key and define its behavior clearly:
- Same idempotency key, endpoint, and authenticated principal means the same logical request.
- Retrying with the same key returns the original job ID and original response.
- If the same key is reused with a different payload, reject it with a clear error.
Stripe’s approach, replaying the original response for a given key, even for errors, is effective because it eliminates ambiguity. The client never has to guess whether the first attempt partially succeeded.
Also, define operational boundaries. How long do you retain idempotency keys: 24 hours, seven days, 30 days? Do they apply to failed responses? If this is undocumented, your support team will end up reverse-engineering behavior from logs.
Choose your completion channel deliberately
You have three common options for signaling completion: polling, webhooks, or both.
Here is the trade space in one view:
| Option | Best when | Main risk |
|---|---|---|
| Polling | Simplicity, firewall-friendly clients | Excess traffic, thundering herd |
| Webhooks | Near real-time updates | Delivery failures, replay issues |
| Hybrid | Need reliability and low latency | More surface area to maintain |
Most mature platforms support both. Polling is the safety net. Webhooks reduce load and latency.
Consider a concrete example. Suppose you process 10,000 jobs per hour at peak. That is about 2.78 jobs per second. If clients poll every two seconds and the average job takes 30 seconds, each job generates roughly 15 polls.
That is 150,000 status requests per hour.
150,000 divided by 3,600 seconds equals about 41.7 additional requests per second.
If your system comfortably handles 500 RPS, that is noise. If you budgeted for 60 RPS total, that is your margin gone. This is why polling strategy, backoff guidance, and webhook adoption are not academic concerns. They directly affect your capacity model.
If you implement webhooks, treat them like a product:
- Sign each request.
- Include an event ID for deduplication.
- Retry with exponential backoff.
- Provide a way to replay missed events.
- Document expected response codes and timeouts.
If your ecosystem is event-heavy, adopting a standard event envelope such as CloudEvents can help keep metadata consistent across teams and services.
A practical five-step build plan
Step 1: Define and test your state machine
Write down the allowed states and transitions. Unit test illegal transitions. Store transition timestamps. Make terminal states explicit. Your status endpoint should always return a machine-readable status plus an optional human-readable context.
Step 2: Always return 202 with a stable status URL
When work is not complete, return 202 and include a stable location for status checks. Never force clients to reconstruct state from logs or guess based on timeouts.
Step 3: Enforce idempotency server-side
Accept an idempotency key on write operations. Persist the first response. Reject conflicting reuses. Log key usage so support can trace incidents quickly.
Step 4: Standardize event and error envelopes
Whether you use webhooks or not, define a consistent JSON envelope for events and errors. Include:
- A unique event or job ID
- A correlation ID
- A timestamp
- A stable error code
Stable error codes matter more than verbose messages. Clients automate against codes.
Step 5: Build observability for humans, not just dashboards
Async systems fail slowly and invisibly. Add:
retry_countandlast_errorfields in job status- Clear terminal error categories
- Correlation IDs returned on job creation and echoed in all related events
When a customer says, “My report never arrived,” your support engineer should be able to query a single job ID and see the entire lifecycle.
FAQ
When should you use 202 instead of 201?
Use 201 Created when the resource is fully created during the request. Use 202 Accepted when processing continues after the response and the final outcome is not yet known.
Do you always need webhooks?
No. For low-volume systems or internal tools, polling may be enough. Webhooks become compelling when volume grows, latency matters, or clients cannot afford aggressive polling.
What is the most common design mistake?
Treating retries as edge cases. In asynchronous systems, retries are normal behavior. Design for them explicitly.
Is a formal event standard required?
Not required. It is helpful once multiple teams or services publish and consume events, because consistency reduces integration drift.
Honest Takeaway
Designing asynchronous APIs is mostly about respecting time and uncertainty. If work takes longer than a normal request window, acknowledge that reality. Return 202. Expose a job resource. Make retries safe. Make completion observable.
If you get two things right, you are ahead of most teams: enforce idempotency on the server, and provide a clear, durable way to track or receive completion. Everything else is refinement. Those two decisions determine whether your API feels predictable under pressure or fragile when it matters most.

