xiand.ai
Apr 13, 2026 · Updated 07:48 PM UTC
AI

Anthropic denies cache changes driving Claude Code quota depletion

Anthropic claims recent reductions in prompt cache time-to-live are not the cause of users hitting usage limits faster.

Alex Chen

2 min read

Anthropic denies cache changes driving Claude Code quota depletion
Photo: thehill.com

Anthropic is disputing claims that recent changes to its prompt cache settings are responsible for users hitting usage quotas more quickly. The AI company previously reduced the time-to-to-live (TTL) for the Claude Code prompt cache from one hour to five minutes for many requests.

Developer Sean Swanson released a bug report alleging that the shift back to a five-minute cache is disproportionately punishing long-session, high-context users. Swanson, a long-term subscriber, noted he had never hit a quota limit until March, but claims the current burn rate is making the service unusable.

The cost of context

Prompt caching is designed to save costs by avoiding the reprocessing of existing data, such as codebases or background instructions. While writing to a five-minute cache costs 25 percent more in tokens than a standard prompt, reading from it costs only about 10 percent of the base price.

Jarred Sumner, an Anthropic employee and creator of the Bun JavaScript runtime, argued that the change actually makes Claude Code cheaper for many users. Sumner noted that a significant portion of requests are one-shot calls where the cached context is used only once and never revisited.

However, the massive 1-million-token context window available in Claude Opus and Sonnet models can lead to expensive cache misses. Claude Code creator Boris Cherny stated that leaving a session idle for over an hour often results in a full cache miss, triggering high costs.

To mitigate these costs, Cherny said Anthropic is investigating a default 400,000-token context window. He noted that users are increasingly pulling in larger amounts of data, such as multiple skills or background automations, which increases the likelihood of hitting limits.

Comments

Comments are stored locally in your browser.