🤖AI’s cost explosion needs a Fintech solution

PROMPTED: AI’s cost explosion needs a Fintech solution

AI is creating new categories of Fintech companies and products. Billing for AI products and fraud screening is already shifting. What comes next is turning the raw infrastructure of AI, compute itself, into a vertical financial stack to manage the coming cost explosion.

Subscribe

Uber burned through its entire AI token budget in four months. Enterprise AI clearly has a cost problem that we’re only just starting to see.

Fintech is beginning to react. There’s a swathe of AI billing and AI routing companies helping to manage this cost problem. Even Stripe acquired Metronome, which specializes in this type of billing. Fal AI and OpenRouter help companies buy tokens from lower-cost providers. And an era of “harness” or control plane companies is emerging to help manage token spend, security, and compliance.

❝

Ramp says AI token overspend is much worse than SaaS sprawl — at least there you had a monthly bill to check. With AI tokens, what do you check? What’s the dashboard? So Ramp built what they needed: an ability to measure token usage directly, the same way you can measure and manage SaaS spend.

Tokenmaxxing is Stupid Until it Isn’t.

I covered how companies are starting to do more internally to control their token cost in Tokenmaxxing is Stupid Until it Isn’t. And you should read that if you haven’t already.

But as you diagnose and begin to address the cost problem, I believe enterprises will eventually want to control where their tokens are generated. The hardware, the neoclouds, and the data centers.

I think the next era of fintech category is being born today.

A couple of weeks ago, I profiled three companies that specialize in managing compute (the billing, pricing, and aggregation of GPU workloads). While writing, I couldn’t shake the feeling that something bigger was forming. I think compute marketplaces, sourcing, and billing are becoming a whole category of finance.

Because not everyone buying compute is an AI lab. As cost pressures bite, companies will want to train, fine-tune, and use custom models, or run models and cost-competitive infrastructure.

Broadly, AI is becoming a utility input to the entire economy. So the infrastructure that finances, prices, and manages that input will become critical. At least as critical as energy finance. Possibly more so.

Because no commodity in history has had demand compound this fast.

Today

Compute is becoming a commodity - but isn’t yet priced like one.
The compute financial stack has three layers: price discovery, sourcing, and billing.
Compute is hard to commoditize but the demand is real
And the wall st guys are circling

1. Compute is becoming a commodity.

Anthropic’s revenue explosion is the fastest in history, bar none. It’s extraordinary.

$700 billion in planned capex from four companies. Compare this to global telecoms during the dotcom era, where even today, the entire industry spends $300 billion. Oil and gas, the commodity that powered the 20th century, is around $1 trillion. We’re watching compute capex approach energy capex in a fraction of the time.

Meta, Microsoft, Alphabet, Amazon have funded capex from free cash flow but that model is showing early signs of breaking. All the hyperscalers are turning to capital markets to fund their expansion ambitions.

Meta is partnering with Blue Owl Capital (yes, that Blue Owl) on a $27bn data center project.
Oracle debt is trading like a junk bond with credit default swap spreads flaring.
The FT did a deep dive into the sheer scale of private capital coming into the AI building boom.

But as AI is scaling, the exponential growth in costs cannot be borne by the customers of compute.

Uber’s CTO blew through the company’s entire 2026 AI budget in the first few months with per-engineer API costs running between $500 and $2,000 a month. NVIDIA’s VP of applied deep learning said that for his team, compute costs now exceed employee salaries. Ramp’s data shows average monthly AI token spend across its customers up 13x since January 2025.

This is unsustainable.

And it’s not just Uber. Microsoft canceled thousands of internal Claude Code licenses last month after costs spiraled past expectations, six months into the pilot. GitHub is moving all Copilot plans to usage-based billing on June 1st, explicitly because agentic workflows consume too much compute for flat-rate pricing to survive.

Recently Polymarket reported on a company who’d allegedly spent $500m on AI tokens in a single month after not setting employee token limits. And Goldman Sachs says they expect AI agent token usage to explode by 24x by 2030 and token usage is about to explode (at least according to Goldman).

— # (#)

“Goldman’s bullish case is that monthly token use could reach 120 quadrillion by 2030, while inference cost per token keeps falling 60%-70% per year. The fight is now between agent productivity and token waste.”

The start-ups have already understood this, and you can see it in the latest Brex spend data, the two companies winning new spend are fal.ai and openrouter. Companies who help you buy AI tokens from cheaper, open weight models like Qwen and Moonshot.

Every company selling AI on a subscription is running toward a pricing cliff, and the companies buying AI on a subscription haven’t realized the price is about to change. Or, in many cases, the lab you signed with quietly already has, and is building its product to consume far more tokens with longer-running tasks.

We’ve seen this price shift before.

In 2010, AT&T killed unlimited data plans because smartphone usage was overwhelming the infrastructure. In 2026, Anthropic, Github and soon, anyone who sells an AI product will be changing pricing because AI usage is overwhelming infrastructure.

With the SpaceX S1, we’re starting to see these massive companies go public. Right now, AI labs can subsidize usage to grab market share. In public markets, the pressure on margin will be substantial, and the era of burn-baby-burn for tokens is reaching its final countdown.

xAI recorded an operating loss of $2.47 billion on $818 million in revenue during the first quarter of 2026
OpenAI estimates an annual loss of roughly $14 billion for 2026 (on $13bn revenue)
Anthropic has a positive contribution margin when you exclude training. And they were the first to cut off all-you-can-eat billing for some users.

The solution here as Goldman suggests, is likely inference costs getting cheaper, and companies getting much better at token efficiency. As I wrote in Tokenmaxxing is stupid until it isn’t.

❝

Unmeasured AI tool use proliferation is a drag.

AI adoption is not effective AI adoption. In most cases, it is the opposite.

Observable token usage gives your enterprise the ability to tell which teams are burning tokens productively and which are running agents in idle loops, and who’s just using ChatGPT to produce slop instead of doing the work.

The first signs of AI’s cost explosion showing up in Fintech is that spend management is becoming dedicated token spend management.

Ramp’s entire Series F product blog centered on the idea that tokens are the third major category of spend after labor and vendors. Therefore, they need to be managed by CFO’s at that macro level.

But there’s something even more interesting happening a layer further down.

Most committed data center builds have not yet been completed; most data center projects are behind schedule. Only about 3 out of 10 are on track. The rest are delayed or canceled. The arrival of new compute supply is not compressing the cost of producing tokens, because supply is not arriving fast enough.

Instead, at the compute layer, raw token generation in data centers is becoming more competitive. As neoclouds and data centers emerge, they’re creating a market where the ability to quickly produce cheap tokens from cheap energy is a competitive advantage. As this layer becomes more competitive, a new type of fintech company is emerging.

The pricing, billing, and financing infrastructure is still limited.

We need new financial products for this market. The market demands them. Oil got this in the 1980s, electricity in the 1990s, and bandwidth... well, we’ll come back to bandwidth.

The emerging compute financial stack has three layers.

2. The Compute Financial Stack

Layer 1: Price Discovery

What is it? Finding the price of compute infrastructure (e.g., H200 GPUs) across multiple providers.

GPU compute has historically been priced by the hyperscalers in giant one-off deals for the labs, and this type of compute pricing will continue (see Anthropic’s new $2.5bn deal with SpaceX for Colossus data center access for inference)

This stack is for everyone else. The folks not spending billions, tens of billions or hundreds on compute.

For example: Silicon Data, tracks rental pricing across major GPU architectures (H100, A100, H200, B200), normalized for configuration and provider. Backed by DRW and Jump Trading, they launched the first daily GPU rental price index on Bloomberg in 2025. For the lenders writing large loans to companies buying compute they now have a fair way to assess compute pricing.

As of early 2026, Silicon Data was showing zero on-demand availability across 90% of providers and renters subletting clusters to each other. When you can’t even find supply at any price, you don’t have a market. You have a commodity screaming for price transparency.

This scenario is exactly like the 1970s oil crisis. OPEC embargoes caused severe supply shocks, leaving gas stations empty. Because buyers couldn't find oil at any price, the market grew desperate for transparency, directly leading to the creation of the WTI crude oil futures market on the NYMEX to find fair value.

There’s just one issue: an H100 is not a B200, and it is not a GB200. Corn is still corn. Can you really standardize something this heterogeneous?

Layer 2: Aggregation and Sourcing

What is it: Finding GPUs you can use for the AI task in front of you, and helping you run that task.

Market context: Companies are now building their own foundation models, like Revolut, Nubank, Ramp, Cursor, and Browserbase. These models outperform the frontier models if they’re trained on a specific workflow with a custom, non-public data set. Cursor now has more data about coding than is available on the web. But they need to find somewhere to actually fine-tune or build that model.

For example: Prime Intellect aggregates GPU supply from 50+ providers and helps companies train, fine-tune, and run evals on their models. They’ve built out the RL post-training infrastructure that lets you close the loop from deployment back to retraining for clients like Ramp, Browserbase, and even NVIDIA.

The missing piece is a competitive cost marketplace. A place where GPU supply meets demand with price signals, not email threads. Prime Intellect is closer to that than anyone outside the hyperscalers.

Layer 3: Billing and Metering

What is it: Billing and metering for the utilization of the GPUs and compute.

Market context: The compute economy is fragmented in a way SaaS wasn’t. Neoclouds, regional colos, small GPU providers don’t have the engineering teams to wire up a generic billing API.

For example: Internet Backyard automates the quote-to-cash workflow for data centers and GPU providers: quoting, metering, invoicing, collections, and payment routing. Their first product, gnomos, replaces the spreadsheet-and-manual-handoff chaos that sits between sales, ops, and finance at most compute providers.

The “Stripe for GPU billing” positioning is sharp, but then I wondered if Stripe itself just absorbs this category, too?

This is a category regardless of who wins it.

The longer-term play for all three layers is data. Whoever aggregates pricing, supply, and billing data across the compute economy controls the information layer. That’s the play that turns plumbing into a platform.

(I also wonder if this entire stack needs to become verticalized. I’d imagine if someone did it, you’d be looking at a double-click-sized moment that creates the behemoth of the AI future.)

But before I get way ahead of myself, all of that assumes we can actually turn compute into a commodity.

3. Can Compute Actually Be Commoditized?

Corn is corn. An H100 isn’t a B200.

Every configuration, network connection, and the entire power supply chain matters immensely. Not all data centers are fully reliable because of their power source or capacity. So you’d be right to assume it’s hard to write a futures contract with a market that is so far away from a standard like “Brent Crude.”

Except.

Anthropic’s CFO, Krishna Rao, said something on Invest Like The Best that reframed this for me. He described how Anthropic uses three chip platforms (Amazon Trainium, Google TPUs, and NVIDIA GPUs) fungibly. They built a custom orchestration layer that lets them swap workloads across architecturally different chips, and he explicitly framed this flexibility as a strategic advantage.

❝

Across those three chip platforms, we're using compute for all of those internal and external uses. And that flexibility—it actually took us a long time to be able to do that... We've invested very heavily to be able to use that compute incredibly flexibly. And then we look across the different generations of those chip platforms and use each generation for the best workload internally.

Anthropic CFO Krishna Rao

If the company building frontier models treats its own compute as fungible in practice, the market can treat compute as fungible in pricing. The asset doesn’t need to be identical. It needs to be interchangeable enough for a contract to reference it. And remember, mortgages aren’t identical either. We can still wrap them into a container, and in a way, that’s what Silicon Data’s index, Ornn’s OCPI, and eventually CME’s futures are doing for compute.

While brainstorming this, Claude helpfully pointed out we’ve had false dawns with commoditizing tech infrastructure before.

And that false dawn was rather infamously, Enron. Yes. That Enron.

Quite aside from being one of the most infamous corporate scandals in history for their flagrant accounting fraud, they did try to commoditize a new asset.

They tried to build a bandwidth trading floor in 1999, making the exact “bandwidth is the new commodity” argument that people make about compute now. And you guessed it, it collapsed. The thesis was actually right (bandwidth did eventually commoditize), but the market structure was premature. The liquidity wasn’t there.

Bandwidth commoditized in price through a massive tech supply shock. Wavelength Division Multiplexing (WDM) multiplied fiber capacity overnight, crashing transit prices by 30% to 50% annually.

Yet, bandwidth never became a financial commodity like corn. Because it is location-bound and perishable and unused capacity vanishes instantly. But it did evolve into a wholesale utility market, behaving more like air cargo or commercial real estate.

This is similar for AI compute: Compute needs to become a utility on price.

The difference for compute: CME is building the exchange, DRW is backing the index, and actual enterprises are already making fungible compute decisions at scale. That’s a different starting position than Enron’s bandwidth play.

So if compute is commoditizing, even imperfectly, what happens next?

Once you have a commodity - Traders want to trade it.

The CEO of BlackRock, Larry Fink, told the Milken Conference that a new asset class will be buying futures of compute because of the power supply chain it relies on so heavily:

❝

"We're short power, we're short compute, we're short chips... I actually believe a new asset class will be buying futures of compute.”

BlackRock CEO Larry Fink

Five days later, CME Group and Silicon Data announced they’re building exactly that. A compute futures market, pending regulatory review, and launching later this year. DRW’s Don Wilson called compute “the largest commodity in the world.”

The contracts will be cash-settled against Silicon Data’s daily GPU rental price indices. AI companies can hedge training costs. Data center operators can lock in revenue. Lenders can underwrite GPU-backed debt against a reference rate instead of a spreadsheet guess.

The financial stack for oil took decades to mature. Electricity took years. Compute is building its stack in months.

We have at least $700 billion in annual capex (and growing), trillion-dollar infrastructure commitments, and no financial infrastructure to price, source, bill for, and manage the risk. When that happens, we usually see the price be commoditized, and if it becomes a critical economic input, we get commodity exchanges and futures markets.

That’s why I think this is the next great financial services infrastructure buildout.

Because today this all runs on spreadsheets, slack threads, and vibes.

That won’t last for long.

ST.