The Throttle - Tony Seets

Anthropic and OpenAI both offer “fast mode” for their coding tools. Same concept, different packaging. Anthropic charges 6x the standard API price for a 2.5x speed improvement.¹ OpenAI charges 2x for faster responses, though they don’t publish a specific speed multiplier.² Both are rationing the same finite resource: inference compute. The difference is in how they dress it up.

Anthropic’s approach is blunt. You want fast, you pay API rates. Your Max subscription doesn’t cover it. OpenAI wraps the same constraint in a feature. “You get fast mode! It just costs 2x your allocation.” Same ceiling, but one feels like a restriction and the other feels like a choice.

I didn’t have a thesis. I had a hunch. So I did what I do now: I opened Claude Code and dispatched a handful of agents to pull threads. Research both companies. Validate everything against primary sources. The irony of using Anthropic’s own model to investigate Anthropic’s compute strategy wasn’t lost on me, and I can’t fully correct for it. Take what follows with that in mind.

Both sides, same week#

On March 4th, Anthropic pushed an update to Claude Code that dropped Opus 4.6’s default thinking effort from “high” to “medium.”³ Two days after a 12-hour global outage that took down Claude.ai, the developer console, and Claude Code.⁴

Jordan Bentley

@jbentley

· Follow

Replying to @ThePrimeagen

Interesting that it also had a huge drop in how many tokens/tool calls it used. Could this be in relation to them changing the default thinking level?

I wish that tracker didn't still label their worst day as 'nominal'.

2:38 PM · Mar 5, 2026

116

Read 5 replies

Jordan Bentley flagged the correlation: token and tool-call volume dropped significantly alongside the thinking level change. The status tracker still labeled it “nominal.” That’s worth noting. Anthropic didn’t announce this as a cost or capacity measure. They shipped it in a changelog. The status page called it a normal day. If we’re keeping score on transparency, quietly reducing output while labeling everything nominal is its own kind of packaging.

That caught my eye. I suspect this is one way to keep your services healthy when they’re suddenly getting hammered. Can’t blame them really. But I could also be pattern-matching where there’s just a routine product update.

The same day, on the OpenAI side:

pash

@pashmerepat

· Follow

So many people are turning on fast mode we need to reshuffle some of our capacity to meet demand — hold tight!

7:21 PM · Mar 5, 2026

417

Read 30 replies

Pash, who works on Codex at OpenAI, saying they need to reshuffle capacity because too many people turned on fast mode. Same week, same constraint, different companies. One quietly adjusted a default. The other told users to hold tight.

Here’s the thing about adaptive thinking: it’s not just a quality knob. It’s an economics lever. At medium effort, Anthropic’s own docs show Opus 4.5 matches its best benchmark performance while using 76% fewer output tokens.⁵ At scale, that’s not a minor optimization. That’s the difference between keeping the lights on and not.

And Pash’s tweet, whether he intended it this way or not, illustrates the flip side. You give people a speed lever, they all pull it, and now you’re reshuffling capacity to handle the demand you created. The generous framing generates the very load that forces the constraint. Or maybe OpenAI just spins up more GPUs and it’s fine. I don’t know. The tweet is suggestive, not conclusive.

Why now#

The timing isn’t random. In late February, the Trump administration directed federal agencies to stop using Anthropic’s products and designated the company a “supply chain risk to national security” after Anthropic refused the Pentagon unrestricted use of Claude.⁶ OpenAI announced a Pentagon deal the same day.⁷ Public backlash followed: ChatGPT uninstalls jumped 295%,⁸ and users flooded to Claude. The app hit #1 in the App Store.

I’m not making a values argument about who was right. I’m pointing out what happened to the infrastructure. Anthropic got hit with a demand spike they didn’t plan for, driven by a political situation they didn’t create. Mike Krieger, who’s building at Anthropic, posted today that more than a million people are now signing up for Claude every day.⁹ That’s the scale of what hit their infrastructure. It’s a plausible explanation for the March 2 outage and the subsequent default change. Not the only explanation. But a plausible one.

The money underneath#

Both companies are running the same math. AI-first products carry gross margins of 20-60%, compared to 70-90% for traditional SaaS.¹⁰ Every new user generating tokens costs real compute. Your best customers are your most expensive customers.

Anthropic’s annualized revenue run rate went from $9 billion in January to $19 billion by early March.¹¹ Those are run-rate projections, not collected revenue, and hypergrowth numbers are volatile. But even directionally, it’s a staggering growth curve. Claude Code alone is running at $2.5 billion annualized.¹² At the same time, Anthropic slashed its gross margin forecast from 50% to 40% because inference costs came in 23% higher than projected.¹³ Revenue screaming, margins compressing. That tension might explain the blunt pricing as much as any principled commitment to transparency. They might not have the room to be generous even if they wanted to.

OpenAI holds roughly two-thirds of consumer AI traffic¹⁴ but their enterprise API market share has been sliding, from 50% to as low as 25%, while Anthropic grew from 12% to 40%.¹⁵ OpenAI is projecting $14 billion in losses for 2026¹⁶ and started running ads in ChatGPT in February.¹⁷ Making fast mode feel like a feature rather than a restriction might just be better product design for their position. Or it might be a retention play. Probably both.

Two companies. Same constraint. Different financial pressures producing different packaging. Neither approach is obviously smarter. Both are managing the same fundamental problem: inference is expensive, demand is growing faster than infrastructure, and something has to give.

What I don’t know#

I put this post through an adversarial review, and it came back with fair criticisms. The connection between “how you price fast mode” and “what it means about a company” might be thinner than I’m making it. Sometimes a rate limit is just a rate limit. A product team tuning a thinking-effort default after an outage is a mundane operational decision. Framing it as a “tell” does a lot of narrative work.

I also can’t fully separate my observations from the tool I’m using to make them. Claude researched this post. Claude validated the claims. Claude might surface and frame information in ways that favor Anthropic at a level I can’t see. I checked everything I could against primary sources, and every footnote links to the original reporting. But that’s not the same as objectivity.

API prices dropped 60-80% in the last year.¹⁸ AI infrastructure capital expenditure is projected to exceed $500 billion this year.¹⁹ The financial pressures on both companies are only going to intensify. Something has to give. Either prices go up, quality goes down, or the packaging gets more creative.

We’re already seeing all three. I’m just not sure what it means yet.

Anthropic’s Claude Fast Mode costs $30/$150 per million tokens (input/output) versus standard Opus 4.6 pricing of $5/$25. WinBuzzer, Claude API Docs ↩
OpenAI’s Priority Processing charges 2x standard pricing. They describe it as “significantly lower and more consistent latency” without quantifying a speed multiplier. OpenAI API Pricing ↩
Claude Code v2.1.68 changelog, released March 4, 2026. GitHub ↩
The March 2, 2026 outage lasted approximately 12 hours, caused by a cascading failure in Anthropic’s distributed database layer. TechCrunch ↩
Anthropic’s effort parameter documentation. At medium effort, Opus 4.5 matches Sonnet 4.5’s best SWE-bench performance while using 76% fewer output tokens. Claude API Docs ↩
Trump directed federal agencies to stop using Anthropic; Defense Secretary Hegseth designated the company a “supply chain risk to national security,” the first time this label has been applied to a U.S. company. CBS News, Just Security ↩
OpenAI published “Our agreement with the Department of War.” OpenAI ↩
Day-over-day ChatGPT mobile uninstall data from Sensor Tower. TechCrunch ↩
Mike Krieger (co-founder of Instagram, now at Anthropic) on March 5, 2026. X ↩
AI-first SaaS margins per Bessemer’s “State of AI 2025” and industry analyses. Monetizely ↩
Bloomberg reported $9B annualized revenue run rate in January 2026, growing to $19B by early March 2026. Run rates at hypergrowth companies are volatile and may not reflect sustained revenue. Bloomberg (Jan 21), Bloomberg (Mar 3) ↩
Claude Code reached $2.5B annualized revenue by February 2026. Constellation Research ↩
Anthropic cut its 2025 gross margin forecast from 50% to 40% due to higher-than-expected inference costs. WebProNews ↩
ChatGPT’s consumer AI traffic share is approximately 68% as of January 2026, down from ~87% a year earlier. Similarweb ↩
Enterprise LLM API market share from Menlo Ventures (mid-2025: OpenAI 25%, Anthropic 32%) and later data (Anthropic ~40%). Menlo Ventures, Jason Lemkin ↩
Per The Information, citing OpenAI’s internal projections. The Information ↩
ChatGPT ads began testing February 9, 2026 for free-tier users. OpenAI ↩
AI API prices dropped roughly 60-80% from early 2025 to early 2026. DEV Community ↩
Goldman Sachs projects AI companies may invest more than $500B in capital expenditure in 2026. Goldman Sachs ↩

Both sides, same week#

Why now#

The money underneath#

What I don’t know#

Sources#