AI’s unit economics are broken — and your cloud bill is going to prove it
Look at what has happened in the AI market over the last couple of months.
GitHub paused new sign-ups for Copilot Pro and a week later moved the whole product to usage-based billing. Tom’s Hardware reported that up to half of US data center capacity slated for 2026 is at risk of slipping. DRAM contract prices jumped 90% in a single quarter. The chief economist of Goldman Sachs went on the record saying AI added “basically zero” to US GDP in 2025. MIT published a preliminary study finding that 95% of enterprise GenAI pilots delivered no measurable business return.
Any two of these in isolation could be coincidence. All five together is a pattern. And we have not even gotten to OpenAI’s actual numbers yet.
The pattern is this: the unit economics of AI today are broken. Not “tight”. Not “rounding-error tight”. Broken. And the most interesting part — not even the vendors are pretending otherwise anymore.
By “broken” I mean the simple thing: the prices people pay today, the plans the vendors sell today, and the capex projections funding all of it do not reconcile to a profitable business at scale. Somebody downstream is going to absorb the difference. The interesting question is who, and when.
The canary just keeled over
If you want to know whether a market is healthy, you do not look at what the analysts say. You look at what the company at the top of the value chain does when nobody is watching. And on April 20, 2026, GitHub did something interesting.
They paused new sign-ups for Copilot Pro, Pro+ and Student plans. They pulled Opus models out of Pro entirely. They announced that even Pro+ would be losing Opus 4.5 and 4.6 — only Opus 4.7 stays. They tightened premium request limits. And they offered everyone a refund window until May 20.
You can read GitHub’s framing of this and it is — let us say — diplomatic. They blame “agentic workflows”, they talk about “service reliability”. But buried in the announcement is the most honest sentence I have read from a vendor in months:
“Long-running, parallelized sessions now regularly consume far more resources than the original plan structure was built to support… a handful of requests can exceed the plan price.”
Read that twice. A handful of requests can exceed the plan price.
This is not a marketing line. This is GitHub admitting that flat-rate pricing on agentic AI workloads does not work. The math broke under their feet. They are the company that built Copilot, that runs on Azure, that has the closest commercial relationship with OpenAI of anyone in the market — and they could not make the monthly Pro fee cover the cost of the product they were selling.
If GitHub cannot make Copilot’s unit economics work, what makes you think your in-house GenAI feature is going to?
And then, one week later, GitHub did the other thing.
On April 27, 2026 — the day before this post went out — GitHub announced that effective June 1, 2026, the Premium Request system is gone. Copilot is moving to token-metered GitHub AI Credits: Pro $10 a month, $10 in credits. Business $19, $19 in credits. Enterprise $39, $39 in credits. Sit with that for a moment. That is a base subscription with zero margin baked in. They are passing the compute through at par.
The framing in the announcement is more candid than the April 20 one. From GitHub’s own post:
“A quick chat question and a multi-hour autonomous coding session can cost the same amount.”
The previous structure, GitHub says, is “no longer sustainable”. Vendors do not write the words “no longer sustainable” about their own product unless the spreadsheet has truly stopped balancing.
There is a pattern here that goes well beyond GitHub. Anthropic raised Claude API prices last year. OpenAI walked back the cheap GPT-4o tier. Cursor restructured. Replit changed their plans twice. The flat-rate, all-you-can-eat era of AI assistants is over and it lasted about eighteen months. The vendors gave you a taste of what AI could do at a price that did not reflect what AI actually costs, and now the bill is being recalibrated in public.
The physical world is hitting back
But ok, GitHub repricing is one thing. Maybe it is a marketing problem. Maybe agentic workflows just need a different commercial model. What about the fundamentals — the actual cost of running this stuff?
This is where it gets ugly.
Tom’s Hardware, summarising research from Sightline Climate, reports that 30 to 50% of the US data center capacity slated for 2026 is at risk of slipping — only about 5 GW of the roughly 16 GW announced for that year is visibly under construction. The bottlenecks are not what you read in keynote slides — they are not “talent shortages” or “regulatory complexity”. They are reinforced concrete and 230 kV transformers and copper bus bars and parts coming out of factories in Shenzhen.
Specifically:
- Power infrastructure — utilities cannot deliver enough capacity fast enough. Substations take three to five years to build. Transformer lead times are measured in years, not months.
- Parts from China — despite years of “onshoring”, the actual components that go into a hyperscale data center still depend on Chinese factories. The political class has not solved this problem. Maybe they will. But not before the buildout slips.
- Water — the cooling infrastructure for AI clusters consumes water at a rate that local jurisdictions are starting to push back on, hard.
- Local opposition — communities are saying no. Loudly. Especially in the places that have already absorbed one or two hyperscale builds.
Sam Altman talks about insatiable demand. Satya Nadella talks about Azure capacity. Jensen Huang talks about a billion GPUs. Those are nice numbers. But you cannot build a data center with a press release. You need permits, switchgear, transformer windings, water rights, and concrete pours. The physical world has limits that do not move because someone in San Francisco wishes they did.
The buildout schedule that all the AI revenue projections depend on — that schedule is slipping. It is slipping right now, in real US counties, with real planning departments. Nobody is going to mention it on the next earnings call.
The RAM crisis nobody warned you about
Ok let us say the data centers do get built. Slower than promised, but built. There is still a problem: you have to put memory in them. And memory just got really, really expensive.
The numbers from TrendForce, IDC and IEEE Spectrum are not subtle. DRAM contract prices went up 90% in Q1 2026 compared to Q4 2025. Server DRAM specifically went up more than 60% quarter-over-quarter. Memory shortages are projected to last through 2027 and possibly into 2028.
Why? Because:
- Data centers now consume over 70% of high-end memory chips produced worldwide. Seventy. Percent.
- AI is projected to consume nearly 20% of global DRAM supply in 2026 on an equivalent-wafer basis (TrendForce). That number was effectively zero five years ago.
- HBM — the high-bandwidth memory that ships next to GPUs — uses approximately 3× the wafer capacity per bit compared to DDR5. Producing HBM is brutally inefficient at the silicon level. Manufacturers are dedicating wafer lines to it because they make more money per wafer, but every HBM bit produced means three DDR5 bits the rest of the world does not get.
You see where this is going. It is not just “AI is expensive”. It is “AI is making everything else expensive too”. The laptop your developer needs. The server hosting your boring CRUD app. The gaming PC you bought your kid. They all just got more expensive because hyperscalers are vacuuming up the wafer pipeline to feed inference.
This is a cost that does not show up on your Azure OpenAI invoice. It shows up two years from now, when you go to buy hardware for an unrelated project and the quote comes back 40% higher than it would have been. The architect who shrugged at AI pricing in 2025 is now also explaining to the CFO why the on-prem refresh budget exploded.
Goldman Sachs, MIT, and the GenAI Divide
Now you might say — Tomasz, this is all anecdote. The vendors will figure out pricing. The data centers will get built eventually. The memory market will normalize. Show me the macro evidence.
Ok. Let me show you the macro evidence.
Goldman Sachs. Jan Hatzius, the chief economist of Goldman Sachs, said publicly that AI’s contribution to US GDP growth in 2025 was “basically zero”. His words. Not mine. Not some bearish blogger. The chief economist of Goldman Sachs.
The reason is interesting: most of the AI investment in the US is going into hardware imported from Taiwan and Korea. So when Microsoft buys 50,000 GPUs from NVIDIA — those GPUs are largely fabricated on TSMC fabs. The dollars leave the country. The investment shows up in Taiwanese and South Korean GDP, not American GDP. A more recent Goldman note is more careful — visible productivity gains so far appear concentrated in a small number of localized use cases, with broader economy-wide effects expected later, not now.
So the trillions in capex are not, so far, producing macroeconomic returns to the country doing the spending. That is awkward. To be clear, this is a macro-accounting and timing observation. A weak GDP signal does not by itself prove that any specific AI vendor is unprofitable — that is a different argument, and the OpenAI numbers are coming up.
MIT NANDA. A preliminary “State of AI in Business 2025” report from MIT’s NANDA initiative looked at 52 executive interviews, 153 leader surveys, and 300 public AI deployments. The headline finding: 95% of enterprise GenAI pilots delivered no measurable P&L impact.
Read that one again. Out of every twenty AI pilots run inside enterprises in 2025, nineteen produced zero detectable business return. And those nineteen pilots spent an aggregate $30 to $40 billion to do it.
If you have been on the practitioner side of this — running these projects, presenting to steering committees, watching the ROI numbers slip every quarter — you already knew. The MIT report just put it in writing. Buying tools from specialized vendors works about 67% of the time. Internal builds work about a third as often. The technology is real. The corporate path to value through it is mostly broken.
So we have the vendors repricing because the unit economics do not work. The physical buildout slipping because of power, water, and supply chain. The memory market in crisis because of the wafer math. The macro economists saying GDP impact was basically zero in 2025. The academic researchers saying 95% of corporate pilots return nothing.
That is not a few warning signs. That is the whole instrument panel lit up red.
OpenAI, the $1.4 trillion bill
I want to come back to the vendor side, because this is the part that affects you most directly when you quote an AI workload to a client.
The biggest pure-play AI company in the world is OpenAI. Whatever runs in your Azure OpenAI deployment is, ultimately, running on top of OpenAI’s economics. So those economics matter to you.
Here is the picture, broken out by what each number actually measures. Revenue and operating spend are P&L figures. Cumulative burn is a multi-year cash-flow projection. The $1.4 trillion is a multi-year capex commitment, not a 2025 line item. They live in different accounting buckets and they should not be added together — but each one is bad on its own:
- 2025 revenue: ~$13 billion (Reuters)
- 2025 reported operating spend: ~$8 billion (Reuters), with the operating math already negative once cost of revenue is layered in
- Cumulative cash burn projected through 2029: ~$115 billion (Reuters)
- Eight-year compute commitments: ~$1.4 trillion (Sam Altman, public statement)
That last number is more than the GDP of Mexico. With a T.
Then there is the most-cited, least-clean figure. From leaked Microsoft revenue-share documents, OpenAI was reportedly burning roughly $2 in compute for every $1 of inference revenue. The direction of travel is clear — inference is structurally expensive, OpenAI’s adjusted gross margin reportedly collapsed from around 40% to 33% in 2025, and inference costs reportedly quadrupled over the year. But the precise compute-to-revenue ratio depends on Microsoft figures that TechCrunch has flagged are net of royalties Microsoft pays back to OpenAI, which means the headline number is contested. Treat that ratio as directionally damning, not audited.
Now ask yourself: when you call your Microsoft account team for an Azure OpenAI quote, what do you think is in that price? Microsoft’s margin? OpenAI’s margin? Or two-thirds of an “investment thesis” that someone is hoping the next round of investors pays for later?
OpenAI’s path to profitability requires one of three things:
- Massive price increases — which break their customers’ unit economics, which is exactly what GitHub had to admit last week
- Massive efficiency gains in inference — which require new hardware that is bottlenecked by data center buildout and HBM wafer constraints we just walked through
- Something else materializing — a productivity miracle, government contracts, an IPO greater-fool exit, a breakthrough that bends the cost curve
None of those are guaranteed. Most of them are not even probable on the timeline OpenAI needs them. And while they figure it out, the price your client pays for that token is partly funding the $115 billion hole.
One honest disclaimer before we move on: most of the OpenAI numbers above come from Reuters reporting, leaked documents and public statements, not audited financial filings. OpenAI is not a public company. Some of these figures will move when better data appears, and a few of them — especially the leaked compute-to-revenue ratio — are explicitly contested. The qualitative picture holds up: very large operating losses, very large multi-year commitments, structurally expensive inference. The exact decimal places, less so. I would rather you walk away with the right shape of the problem than with one wrong number you remembered for the wrong reason.
What I am actually doing in projects right now
Now — and this is where I want to be careful — I am not telling you to walk away from AI. I use Claude every day. I use GitHub Copilot every day. I build AI solutions and I help organizations run their AI programs in production. The value is real. My productivity is up. The teams I work with are shipping faster. None of that is in question.
What is in question is the economics underneath all of it. And those still do not add up. So I am building like I know the bills are coming.
The good news here is that you do not have to invent a cost-control playbook. Every major vendor has published one. Microsoft, AWS, Google Cloud, OpenAI and Anthropic all converge on roughly the same handful of practices. The bad news is that most teams I see do not actually do most of them. Here are the ones I follow on every project, with the receipts:
-
Right-size the model. Aggressively. This is the single biggest cost lever you have, and every vendor agrees. Microsoft Learn puts model selection at the top of the cost optimisation list. AWS calls it “the single most impactful FinOps decision”. Google’s Vertex AI guidance is to run your task on Gemini Flash first and only escalate to Pro for the things that genuinely need complex reasoning. The pattern is the same everywhere — most workloads do not need a frontier model. Default small. Promote to big only on the high-value paths.
-
Turn on prompt caching. This one almost feels like cheating. OpenAI documents up to 90% reduction in input token cost and 80% reduction in latency with prompt caching enabled, and it is automatic on recent models. Anthropic charges 10% of base input price for cache reads — they pay back after a single cache read. AWS Bedrock reports the same up-to-90% number. The implementation rule is identical across all of them: put static content (system prompt, instructions, large context) at the start, variable content at the end. Most teams I see have not flipped this switch yet, which is wild.
-
Use the Batch API for anything that does not need to be real-time. Across OpenAI, Anthropic and AWS Bedrock, batch inference is roughly 50% cheaper than synchronous inference. Microsoft’s guidance is direct: any workload tolerating 24-hour latency should go through the Batch API. Document analysis, embedding generation, nightly content pipelines, scheduled summaries — push them all to batch and recover half the bill.
-
Set max token limits on every deployment. Microsoft’s cost-management guidance for Azure OpenAI is direct on this: applications that do not cap output leak tokens at every call — text gets generated, gets billed, and most of it never reaches the user. The fix is one configuration parameter. The exact size of the saving is workload-specific, but the direction is always one way.
-
Tag every AI resource and pipe it into the vendor’s cost tool. Azure Cost Management, AWS Cost Explorer and Google Cloud cost tracking all do per-tag chargeback. If your AI features are not tagged with cost-center, project, environment and owner, you do not actually know what you are spending — you have a number, but you do not have a story. The aggregate “Azure OpenAI bill” tells you nothing useful. You need to know which feature, which user segment, which session generated which percentage of that bill.
-
Hunt down idle deployments and ghost endpoints. Fine-tuned model hosting in Azure incurs hourly fees regardless of traffic. A Vertex AI notebook with an attached GPU keeps billing whether anyone is using it or not. AWS Bedrock provisioned-throughput models keep billing whether you call them or not. The pattern across all three clouds is identical — AI infrastructure rots into ghost charges faster than any other category of cloud spend, because everyone is afraid to delete things they “might need”. Run a monthly sweep.
And then there are two things I add to every project anyway, that you will not find in the vendor docs — because the vendor docs assume their pricing is stable, and we have just spent six sections establishing that it is not:
- Pin model versions in client contracts. When we agree on a price for an AI feature, the contract spells out exactly which model version that price assumes. If the vendor deprecates the cheap tier or quietly swaps the underlying model, the price changes too. No surprises in month six.
- Abstract the model boundary so swaps are cheap. If your prompt templates and orchestration logic are so coupled to one provider that switching is a six-month project, you have built yourself into a corner. Eighteen months from now, you will use that abstraction. Guaranteed.
This is not paranoid. This is how you architect for a market where the vendor’s own unit economics do not work. You do not bet your client’s product on the vendor figuring out their pricing before your runway runs out.
The party is not over. But somebody is about to get the check.
AI is not going away. The good models will keep getting better. The productivity gains for individual developers and individual workflows are real and I will keep using them.
But the spreadsheet behind all of it does not balance. The vendors know it. The economists know it. The data center planners know it. And starting last week, even GitHub knows it.
Build like you know it too. Make sure the architecture you ship today still works when the price doubles. Because it will.
Comments