I Spent Hours Debugging a 429 Error on AWS Bedrock That Wasn't Even My Fault

I was trialing Claude Code on Amazon Bedrock. We use a shared AWS account at work — one account, a few of us on the team, each with our own API key. I wanted to see if Bedrock was worth using over a direct Anthropic subscription before committing to it.

I set it up, ran my first task, and got this:

ThrottlingException: Too many tokens per day, please wait before trying again.

First request of the day. I hadn't done anything yet.

I switched regions. Same error. I checked the Service Quotas dashboard — the numbers looked fine on paper. I googled for 30 minutes and ended up in a pile of AWS re:Post threads where everyone was confused and nobody had a clean answer. At that point I genuinely thought I'd misconfigured something obvious and was too embarrassed to ask the team.

I hadn't. The problem was buried in how Bedrock counts tokens, and it took me a lot of reading to actually understand it. This post is what I wish someone had sent me before I started.

First, figure out which problem you actually have

I learned this the hard way: the same error message covers two completely different situations.

Situation 1 is a Bedrock provisioning bug. It shows up on new AWS accounts or freshly enabled Bedrock access. Your quota dashboard looks fine but the internal effective limit is near zero. You'll hit 429 on your first request across multiple regions even though you've barely touched anything. This one needs an AWS Support case — you can't fix it yourself. You ask them to check your effective quotas (not just the defaults) for the Claude model you're using, include the x-amzn-requestid header from a failed call, and they usually sort it within a day.

Situation 2 is the one I had, and the one most Claude Code users hit. Your quota is real, but you're burning through it 8x faster than you think. The rest of this post is about that.

The thing Bedrock doesn't tell you upfront

I assumed Bedrock charged you based on tokens you actually use. That's how billing works, sure, but it's not how quota consumption works.

Bedrock uses a reservation system. The moment you send a request, before Claude writes a single word back, Bedrock locks up tokens from your daily quota based on max_tokens — the maximum output your request could generate. When the response finishes, it reconciles and only bills you for actual usage. But the reservation already happened.

The AWS documentation is clear about this:

"The max_tokens value is deducted from your quota at the beginning of each request."

So if max_tokens is 64,000 (the default for Claude Sonnet 4 when you don't set it), Bedrock locks 64,000 tokens from your quota the moment you hit send — even if Claude's actual response is 200 words.

There's a second layer to this. Claude Sonnet 4 and later models have a 5x burndown rate on output tokens. One output token consumes five tokens from your quota. So the real formula Bedrock uses upfront is:

Reserved = input tokens + (max_tokens × 5)

For a typical request with 1,000 input tokens and an unset max_tokens:

1,000 + (64,000 × 5) = 321,000 tokens reserved

Per request. Before Claude says anything.

If your daily quota is 500,000 tokens, one request just consumed 64% of it. Your second request probably tips you over.
Token Reservation flow
(Diagram 1 goes here — the token reservation flow)

Why Claude Code specifically destroys your quota

A single Claude Code task isn't one API call. It's a chain of them. Claude reads a file, writes some code, runs a tool, checks the output, reads another file, loops back. A refactor that feels like one interaction might be 20 or 30 API calls behind the scenes.

By default, CLAUDE_CODE_MAX_OUTPUT_TOKENS is 32,000. Every single background call reserves 32,000 × 5 = 160,000 tokens upfront. Run one multi-step coding task and you can wipe your daily quota before the task even finishes.

There's also a documentation gap that Anthropic's own users flagged. The general Claude Code settings page says the default is 32,000. The Bedrock-specific page recommends 4,096. Most people — including me — set up using the general docs and never see the Bedrock note. GitHub issue #18963 is exactly this complaint.

The shared account problem

This part is specific to my situation, but I suspect it's common.

We run a shared AWS account. A few of us use Claude Code from it, each with our own API key. I assumed our keys gave us separate quota pools. They don't. API keys in Bedrock are for authentication, not isolation. Every one of us pushes reservations against the same single daily quota.

So if someone on the team runs a big refactor in the morning, the rest of us start hitting 429 by lunch. Not because we've done anything wrong, but because one person's Claude Code session reserved hundreds of thousands of tokens out of a shared bucket.

(Diagram 2 goes here — shared account draining)

The fix for the shared account situation is still the same token setting below, but longer term the right move is separate AWS accounts per developer if your team uses this heavily. Bedrock quotas are per-account, so that's the only real way to isolate usage.

The fix

Set this before running Claude Code:

export CLAUDE_CODE_MAX_OUTPUT_TOKENS=4096

The official Anthropic Bedrock docs recommend this specifically:

"CLAUDE_CODE_MAX_OUTPUT_TOKENS=4096: Bedrock's burndown throttling logic sets a minimum of 4096 tokens as the max_token penalty. Setting this lower won't reduce costs but may cut off long tool uses, causing the Claude Code agent loop to fail persistently."

4,096 is the floor Bedrock uses regardless. Going lower than that doesn't save quota — it just risks cutting Claude off mid-response on longer tasks. But going from 32,000 to 4,096 drops your per-request reservation from 160,000 tokens to 20,480. That's roughly 8x more tasks from the same quota.

Billing doesn't change. You only pay for tokens Claude actually generates. The reservation is what changes, and that's what was killing you.

(Diagram 3 goes here — before vs after comparison)

Full working setup

If you want to stop exporting variables every terminal session, put this in your Claude Code settings file at ~/.claude/settings.json:

{
  "env": {
    "CLAUDE_CODE_USE_BEDROCK": "1",
    "AWS_REGION": "us-east-1",
    "CLAUDE_CODE_MAX_OUTPUT_TOKENS": "4096",
    "DISABLE_PROMPT_CACHING": "1",
    "MAX_THINKING_TOKENS": "1024",
    "ANTHROPIC_MODEL": "us.anthropic.claude-sonnet-4-20250514-v1:0",
    "ANTHROPIC_SMALL_FAST_MODEL": "us.anthropic.claude-3-5-haiku-20241022-v1:0"
  }
}

DISABLE_PROMPT_CACHING=1 helps with some quota accounting quirks specific to Bedrock. MAX_THINKING_TOKENS=1024 keeps extended thinking from chewing through your budget on every call.

Still getting 429 after all this?

Two more things to check.

Parallel tool calls. Even with the token fix, Claude Code can spike your in-flight reservations if it fires multiple tool calls at the same time. A developer on AWS re:Post reported this exactly — they were well under their 200k TPM limit in total, but running 10 parallel LLM calls caused a short reservation spike that tripped throttling. Dropping to 5 parallel calls fixed it. You can control this in Claude Code with:

export CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY=1

Set it to 1 and all tool calls run sequentially. Slower, but it won't spike your quota.

Different region. us-east-1 and us-west-2 have the largest capacity pools. If you're running from eu-west-2 or ap-southeast-1, try switching. Each region has independent quotas, so what's exhausted in one is untouched in another.

What I'd tell someone starting out

Use a separate AWS account per developer if you can. It's a few minutes of setup and it means one person's heavy session doesn't tank everyone else's day.

Set CLAUDE_CODE_MAX_OUTPUT_TOKENS=4096 before your first run. Don't wait to hit 429 to figure this out.

Check your Service Quotas dashboard in the AWS console and look at the actual token-per-day and token-per-minute numbers for the Claude model you're using. If anything shows 0 or looks wrong, open a support case before spending more time debugging locally.

The frustrating part is that none of this is obvious from the Claude Code setup guide or the Bedrock onboarding. You have to connect dots across an AWS blog post, a Bedrock docs page, a few GitHub issues, and some re:Post threads. I did that so you don't have to.