Why Your AI Agent's Retry Logic Is Making Things Worse
Naive retries on non-idempotent agent actions multiply side effects and cost. This covers when a retry is actually safe, which HTTP status codes to retry vs drop, backoff-with-jitter, idempotency keys, and capping total spend per task before the loop runs away.
I once watched a retry loop send 14 Slack messages in 11 minutes because a tool call returned a timeout after it had already succeeded. The messages were real. The error was noise. That run cost $0.43 in tokens and a conversation with a very confused user.
When is a retry safe?
A retry is safe when the action is idempotent: running it twice produces the same result as running it once. Reading context, fetching a URL, calling a pure lookup tool. None of those accumulate. Run them ten times and the world stays the same.
The danger zone is write operations: sending a message, creating a record, charging a card, calling a webhook, firing off an email. If the tool call partially executed before the error surfaced, a retry executes the remainder and then some. Your agent does not know the difference between "the request never reached the server" and "the server processed it and the acknowledgment got dropped." Both look like a timeout from the client side.
The working definition I use: an action is retryable if you can run it N times and the state after all N runs equals the state after one successful run. If you cannot guarantee that, you need an idempotency key or a dead-letter queue rather than a bare retry.
Which errors should you actually retry?
The HTTP status code tells you most of what you need to know. Anthropic's API is explicit about this.
| Status | Name | Retry? | Why |
|---|---|---|---|
| 400 | invalid_request_error | No | Your payload is malformed. Retrying sends the same bad request. |
| 401 | authentication_error | No | Wrong API key. Fix it, don't loop. |
| 403 | permission_error | No | Your key doesn't have access. Same outcome on retry. |
| 404 | not_found_error | No | The resource is gone or never existed. |
| 429 | rate_limit_error | Yes, with backoff | You hit a per-minute RPM, ITPM, or OTPM limit. Wait and retry. |
| 500 | api_error | Sometimes | Transient server fault. One or two retries with backoff, then give up. |
| 529 | overloaded_error | Yes, with longer backoff | Anthropic's infrastructure is under load. Not your quota. Retry more aggressively, or fail over to Bedrock/Vertex. |
| Timeout | (network) | Depends on action | Only retry if the action is idempotent. Otherwise, check for a partial execution first. |
429 and 529 look similar but are not the same problem. A 429 means your account hit a limit; a 529 means Anthropic's side is saturated across all users. The 529 has no reliable retry-after header, so your backoff floor needs to be higher. I start at 5 seconds for 529 vs 1 second for 429.
400-series errors below 429 should never trigger a retry loop. If you see them appearing in retry logs, you have a bug in your request construction. The failure is not transient.
How do you stop a retry storm?
Exponential backoff with jitter is the standard answer, and it is correct. The jitter part is the piece most implementations skip.
Without jitter, every client that hit the same 429 at the same moment backs off to exactly the same interval and then hammers the API again simultaneously. You recreate the spike that caused the limit hit. With jitter you spread the retries across a window and the burst dissolves.
Here is the pattern I use for Anthropic tool call retries:
const BASE_MS = 1000;
const MAX_MS = 60_000;
const MAX_ATTEMPTS = 5;
async function callWithRetry<T>(
fn: () => Promise<T>,
isIdempotent: boolean,
attempt = 0
): Promise<T> {
try {
return await fn();
} catch (err: any) {
const status = err?.status ?? err?.statusCode;
// Non-retryable: bad request, auth, permissions, logic errors
if ([400, 401, 403, 404].includes(status)) throw err;
// Non-idempotent actions get one attempt only
if (!isIdempotent && attempt > 0) throw err;
if (attempt >= MAX_ATTEMPTS) throw err;
// 529 overloaded: longer floor
const floor = status === 529 ? 5000 : BASE_MS;
const exp = Math.min(floor * 2 ** attempt, MAX_MS);
const jitter = Math.random() * exp * 0.3;
const delay = exp + jitter;
await new Promise(r => setTimeout(r, delay));
return callWithRetry(fn, isIdempotent, attempt + 1);
}
}
The isIdempotent flag is the key discipline. You make the call site declare whether the action can safely repeat. If the caller cannot answer that question, the default should be one attempt.
After three years of agent logs, the pattern I keep seeing is not "too few retries" but "too many retries on the wrong errors." A 400 in a retry loop is a bug masquerading as resilience.
How do you make non-idempotent tool calls safe to retry?
Idempotency keys. Before the tool call, generate a stable key from the inputs (a hash of the action type plus its parameters works). Pass that key to the downstream service. If the service sees the same key twice, it returns the result of the first execution instead of running the action again.
Not every service supports this, but Stripe does, most email APIs have a Message-ID, and you can implement it yourself on any internal tool by storing (key, result) in a short-lived cache before writing the side effect. The cache TTL just needs to cover your retry window, so 5 minutes is usually enough.
For agent tool calls where you control the tool implementation, this is the cleanest fix. For third-party tools you cannot modify, the safer path is to check state before retrying. If the action was "send a message," query for recent messages before the second attempt.
How do you cap total spend per task?
Retry loops compound cost in two ways: the retries themselves consume tokens, and if the retries succeed and the task keeps running, the downstream work compounds on top. A task that was priced at $0.08 can finish at $0.60 if it retried four times, continued running, and had a few tool calls that fired twice.
The controls I use:
- Per-task token budget: track cumulative input + output tokens against a ceiling. Abort cleanly when the ceiling is hit. Anthropic's token counting endpoint lets you measure before you send.
- Retry ceiling:
MAX_ATTEMPTS = 5is a hard stop. The loop never exceeds it. Log every retry with its reason and cost delta. - Idempotency window: keep a short-lived store of completed tool call keys. Before any write operation, check whether it already ran in this task.
Running all three together, the worst-case cost for a task is bounded: base_cost * (1 + retry_factor) * MAX_ATTEMPTS. Without them it is unbounded.
If you ship one change after reading this, make non-idempotent tool calls declare themselves at the call site. Everything else follows from that.
FAQ
What is the difference between a 429 and a 529 from Anthropic?
A 429 (rate_limit_error) means your account exceeded a per-minute limit on requests, input tokens, or output tokens. A 529 (overloaded_error) means Anthropic's infrastructure is saturated across all users and has nothing to do with your quota. The retry behavior should differ: 429 respects the retry-after header; 529 needs a longer floor and occasional failover to Bedrock or Vertex.
How long should I wait before the first retry?
For 429, start at 1 second and check the retry-after header first if present. For 529, start at 5 seconds. For network timeouts on idempotent actions, 2 seconds. Always add jitter of 20-30% of the computed delay so concurrent clients do not synchronize.
Should I retry on tool call errors inside the agent loop?
Only if the tool error is a transport failure (timeout, 5xx from the tool server). If the tool returned a structured error (wrong input, resource not found, permission denied), that is a logic error and the agent should handle it through its reasoning instead of retrying the same call. Retrying a tool that returned "user not found" just runs the same failing lookup five times.
Tired of re-keying the same data between tools? Pylonworks builds custom automation and internal tools for businesses without a developer, on a fixed quote you approve up front. Tell us what's eating your time