16 minute read

Look at what has happened in the AI market over the last couple of months.

GitHub paused new sign-ups for Copilot Pro and a week later moved the whole product to usage-based billing. Tom’s Hardware reported that up to half of the US data center capacity slated for 2026 is at risk of slipping. DRAM contract prices jumped 90% in a single quarter. The chief economist of Goldman Sachs went on the record saying AI added “basically zero” to US GDP in 2025. MIT published a preliminary study finding that 95% of enterprise GenAI pilots delivered no measurable business return.

Any two of those in isolation could be coincidence. All five together is a pattern. And we have not even gotten to OpenAI’s actual numbers yet.

So here is the pattern. The unit economics of AI today are broken. Not tight, not rounding-error tight, properly broken. And the part I find most telling is that the vendors have stopped pretending otherwise.

By “broken” I mean the boring, literal thing: the prices people pay today, the plans the vendors sell today, and the capex projections funding all of it do not add up to a profitable business at scale. Somebody downstream is going to eat the difference. The only real question is who, and when.

The canary just keeled over

If you want to know whether a market is healthy, do not listen to the analysts. Watch what the company at the top of the value chain does when it thinks nobody is paying attention. And on April 20, 2026, GitHub did something very interesting.

They paused new sign-ups for Copilot Pro, Pro+ and Student plans. They pulled Opus models out of Pro entirely. They said even Pro+ would lose Opus 4.5 and 4.6, leaving only Opus 4.7. They tightened premium request limits. And they offered everyone a refund window until May 20.

GitHub Copilot Individual Plans changes April 2026: sign-ups paused, Opus models removed, limits reduced

GitHub’s own framing of all this is, let us say, diplomatic. They blame “agentic workflows,” they talk about “service reliability.” But buried in the announcement is the most honest sentence I have read from a vendor in months:

“Long-running, parallelized sessions now regularly consume far more resources than the original plan structure was built to support… a handful of requests can exceed the plan price.”

A handful of requests can exceed the plan price. That is not a marketing line. That is GitHub admitting flat-rate pricing on agentic AI workloads does not work. The math broke under their feet. This is the company that built Copilot, that runs on Azure, that has the closest commercial relationship with OpenAI of anyone in the market, and they still could not make the monthly Pro fee cover the cost of the thing they were selling.

So if GitHub cannot make Copilot’s unit economics work, what exactly makes you think your in-house GenAI feature is going to?

And then, one week later, GitHub dropped the other shoe.

On April 27, 2026, the day before this post went out, GitHub announced that as of June 1, 2026 the Premium Request system is gone. Copilot moves to token-metered GitHub AI Credits: Pro is $10 a month with $10 in credits, Business is $19 with $19 in credits, Enterprise is $39 with $39 in credits. Sit with that for a second. That is a base subscription with zero margin baked in. They are passing the compute straight through at cost.

The wording this time is even more candid than on April 20. Straight from GitHub’s post:

“A quick chat question and a multi-hour autonomous coding session can cost the same amount.”

The old structure, GitHub says, is “no longer sustainable.” A vendor does not write “no longer sustainable” about its own flagship product unless the spreadsheet has genuinely stopped balancing.

And this goes well beyond GitHub. Anthropic raised Claude API prices last year. OpenAI walked back the cheap GPT-4o tier. Cursor restructured. Replit changed their plans twice. The flat-rate, all-you-can-eat era of AI assistants is over, and it lasted all of about eighteen months. They gave you a taste of what AI can do at a price that had nothing to do with what AI actually costs, and now the bill is being recalibrated in public.

The physical world is hitting back

Ok, but GitHub repricing is just one company. Maybe it is a marketing problem. Maybe agentic workflows simply need a different commercial model. What about the fundamentals, the actual cost of running this stuff?

This is where it gets ugly.

Tom’s Hardware, summarising research from Sightline Climate, reports that 30 to 50% of the US data center capacity slated for 2026 is at risk of slipping. Only about 5 GW of the roughly 16 GW announced for that year is visibly under construction. And the bottlenecks are not the ones you see on keynote slides. They are not “talent shortages” or “regulatory complexity.” They are reinforced concrete, 230 kV transformers, copper bus bars, and parts coming out of factories in Shenzhen.

Go through them one by one and it is the same story. Power is the big one: utilities cannot deliver capacity fast enough, substations take three to five years to build, and transformer lead times are measured in years rather than months. Then there are the parts. After all the talk of “onshoring,” the components that actually go into a hyperscale data center still lean on Chinese factories, and the political class has not solved that. Maybe they will, but not before the buildout slips. Water is its own fight, because the cooling for these AI clusters drinks it at a rate that local jurisdictions are starting to push back on, hard. And then there is plain local opposition, with communities saying no, loudly, especially the ones that already swallowed a hyperscale build or two.

Sam Altman talks about insatiable demand. Nadella talks about Azure capacity. Jensen Huang talks about a billion GPUs. Lovely numbers, all of them. But you cannot build a data center with a press release. You need permits, switchgear, transformer windings, water rights, and concrete that has actually been poured. The physical world has limits, and they do not move just because someone in San Francisco wishes they would.

The buildout schedule that every one of those AI revenue projections quietly depends on is slipping. Right now, in real US counties, with real planning departments and real annoyed residents. Nobody is going to bring that up on the next earnings call.

The RAM crisis nobody warned you about

Ok, let us say the data centers do get built. Slower than promised, but built. You still have to put memory in them, and memory just got really, really expensive.

DRAM contract price surge from Q4 2025 to Q1 2026, with AI workloads and HBM driving the shortage

The numbers from TrendForce, IDC and IEEE Spectrum are not subtle. DRAM contract prices went up 90% in Q1 2026 versus Q4 2025. Server DRAM specifically went up more than 60% quarter over quarter. And the shortage is projected to run through 2027 and maybe into 2028.

Why so brutal? A few reasons stack on top of each other. Data centers now eat more than 70% of the high-end memory chips produced on the planet. AI alone is projected to swallow nearly 20% of global DRAM supply in 2026 on an equivalent-wafer basis, says TrendForce, and that number was basically zero five years ago. On top of that, HBM, the high-bandwidth memory that sits next to the GPUs, burns roughly three times the wafer capacity per bit that DDR5 does. It is savagely inefficient at the silicon level, but manufacturers keep handing wafer lines to it because it pays better per wafer. Every HBM bit they make is three DDR5 bits the rest of us never see.

And that is the part that should worry you even if you never touch a GPU. This is not only AI getting expensive, it is AI dragging the price of everything else up with it. The laptop your developer needs, the server running your boring CRUD app, the gaming PC you bought your kid, all of it got pricier because the hyperscalers are hoovering up the wafer pipeline to feed inference.

None of this lands on your Azure OpenAI invoice. It lands two years from now, when you go to buy hardware for some completely unrelated project and the quote comes back 40% higher than it had any right to be. The architect who shrugged at AI pricing in 2025 is the same one now explaining to the CFO why the on-prem refresh budget blew up.

Goldman Sachs, MIT, and the GenAI Divide

Now, you might be thinking: Tomasz, this is all anecdote. The vendors will sort out pricing. The data centers will get built eventually. The memory market will calm down. Show me the macro evidence.

Fair. Here it is.

Start with Goldman Sachs. Jan Hatzius, their chief economist, said in public that AI’s contribution to US GDP growth in 2025 was “basically zero.” That is not me talking, and it is not some bearish blogger. That is the chief economist of Goldman Sachs.

The reason is genuinely interesting. Most of the AI investment in the US goes into hardware imported from Taiwan and Korea. So when Microsoft buys 50,000 GPUs from NVIDIA, those GPUs are largely fabricated in TSMC’s fabs. The dollars leave the country. The investment lands in Taiwanese and South Korean GDP, not American GDP. A more recent Goldman note is more careful about it: the visible productivity gains so far sit in a small number of localized use cases, with the broader economy-wide effects expected later rather than now.

So the trillions in capex are not, so far, producing macroeconomic returns for the country actually doing the spending. That is awkward. To be fair, this is a macro-accounting and timing point. A weak GDP signal does not on its own prove any specific AI vendor is unprofitable. That is a separate argument, and the OpenAI numbers are coming.

Then there is MIT NANDA. Their preliminary “State of AI in Business 2025” report pulled together 52 executive interviews, 153 leader surveys, and 300 public AI deployments. The headline: 95% of enterprise GenAI pilots delivered no measurable P&L impact.

Do the math on that. Out of every twenty AI pilots run inside enterprises in 2025, nineteen produced zero detectable business return. And those nineteen pilots burned an aggregate $30 to $40 billion getting there.

If you have been on the practitioner side of this, running the projects, presenting to steering committees, watching the ROI slide every quarter, you already knew. The MIT report just wrote it down. Buying tools from specialized vendors works about 67% of the time. Internal builds work roughly a third as often. The technology itself is real. The corporate route to actually capturing value from it is mostly broken.

So line it up. Vendors repricing because the unit economics do not work. The physical buildout slipping on power, water and supply chain. The memory market in crisis over wafer math. Macro economists putting GDP impact at basically zero for 2025. Academic researchers finding 95% of corporate pilots return nothing. That is not one or two warning lights. That is the whole instrument panel lit up red.

OpenAI, the $1.4 trillion bill

I want to swing back to the vendor side, because this is the part that hits you most directly the moment you quote an AI workload to a client.

The biggest pure-play AI company on earth is OpenAI. Whatever runs in your Azure OpenAI deployment is, in the end, running on top of OpenAI’s economics. So OpenAI’s economics are your economics, whether you like it or not.

OpenAI 2025 unit economics: revenue versus spend, projected cash burn, and $1.4 trillion compute commitment

One thing first, because these numbers get mashed together in a misleading way. Revenue and operating spend are P&L figures. Cumulative burn is a multi-year cash-flow projection. The $1.4 trillion is a multi-year capex commitment, not a 2025 line item. They sit in different accounting buckets and you should not add them up. But each one is ugly on its own:

  • 2025 revenue: around $13 billion (Reuters)
  • 2025 reported operating spend: around $8 billion (Reuters), and the operating math is already negative once you layer in cost of revenue
  • Cumulative cash burn projected through 2029: around $115 billion (Reuters)
  • Eight-year compute commitments: around $1.4 trillion (Sam Altman, public statement)

That last one is bigger than the GDP of Mexico. With a T.

Then there is the most-quoted and least-clean figure of the lot. From leaked Microsoft revenue-share documents, OpenAI was reportedly burning about $2 of compute for every $1 of inference revenue. The direction of travel is not in doubt: inference is structurally expensive, OpenAI’s adjusted gross margin reportedly fell from around 40% to 33% in 2025, and inference costs reportedly quadrupled over the year. But that precise $2-to-$1 ratio leans on Microsoft figures that TechCrunch has flagged are net of royalties Microsoft pays back to OpenAI, so the headline number is contested. Treat it as directionally damning rather than audited.

Now ask yourself the obvious question. When you ring your Microsoft account team for an Azure OpenAI quote, what is actually inside that price? Microsoft’s margin? OpenAI’s margin? Or two-thirds of an “investment thesis” that someone is quietly hoping the next funding round will cover?

OpenAI’s road to profitability needs one of three things to happen:

  1. Big price increases, which break their customers’ unit economics, which is exactly what GitHub just had to admit.
  2. Big efficiency gains in inference, which need new hardware that is bottlenecked by the data center buildout and the HBM wafer crunch we just went through.
  3. Something else turning up: a productivity miracle, government contracts, an IPO greater-fool exit, some breakthrough that bends the cost curve.

None of those are guaranteed. Most are not even likely on the timeline OpenAI needs them on. And while they work it out, the price your client pays for that token is partly plugging the $115 billion hole.

One honest disclaimer before we move on. Most of the OpenAI numbers above come from Reuters reporting, leaked documents and public statements, not audited filings. OpenAI is not a public company. Some of these figures will shift when better data shows up, and a couple of them, especially that leaked compute-to-revenue ratio, are openly disputed. The shape holds either way: very large operating losses, enormous multi-year commitments, inference that is expensive at the structural level. The exact decimal places, much less so. I would rather you walk out of here with the right shape of the problem than with one precise-looking number you memorised for the wrong reason.

What I am actually doing in projects right now

Now, and I want to be careful here, I am not telling you to walk away from AI. I use Claude every day. I use GitHub Copilot every day. I build AI solutions and I help organizations run their AI programs in production. The value is real, my own productivity is up, and the teams I work with are genuinely shipping faster. None of that is in question.

What is in question is the economics sitting underneath all of it, and those still do not add up. So I build like I already know the bills are on their way.

The good news is you do not have to invent a cost-control playbook from scratch. Every major vendor has published one, and Microsoft, AWS, Google Cloud, OpenAI and Anthropic all land on roughly the same handful of practices. The bad news is that most teams I walk into are not actually doing most of them. Here are the ones I follow on every project, with the receipts:

  • Right-size the model, aggressively. This is the single biggest cost lever you have, and every vendor agrees. Microsoft Learn puts model selection at the top of its cost-optimisation list. AWS calls it “the single most impactful FinOps decision.” Google’s Vertex AI guidance is to run your task on Gemini Flash first and only escalate to Pro for the things that genuinely need heavy reasoning. Same message everywhere: most workloads do not need a frontier model, so default to small and only promote to big on the high-value paths.
  • Turn on prompt caching. This one almost feels like cheating. OpenAI documents up to a 90% cut in input token cost and an 80% cut in latency with caching on, and it is automatic on recent models. Anthropic charges 10% of the base input price for cache reads, so it pays for itself after a single read. AWS Bedrock reports the same up-to-90% figure. The rule is identical across all of them: put your static content (system prompt, instructions, big context) at the start and the variable content at the end. Most teams I see have not flipped this switch, which is honestly wild.
  • Use the Batch API for anything that does not need to be real time. Across OpenAI, Anthropic and AWS Bedrock, batch inference runs roughly 50% cheaper than synchronous. Microsoft’s guidance is blunt: any workload that can tolerate 24-hour latency should go through the Batch API. Document analysis, embedding generation, nightly content pipelines, scheduled summaries, push the lot to batch and claw back half the bill.
  • Set max token limits on every deployment. Microsoft’s cost-management guidance for Azure OpenAI is direct about it: apps that never cap output leak tokens on every call. Text gets generated, it gets billed, and most of it never even reaches the user. The fix is a single config parameter. How much you save is workload-specific, but it only ever moves one way.
  • Tag every AI resource and feed it into the vendor’s cost tool. Azure Cost Management, AWS Cost Explorer and Google Cloud cost tracking all do per-tag chargeback. If your AI features are not tagged with cost-center, project, environment and owner, you do not actually know what you are spending. You have a number, but you do not have a story. The aggregate “Azure OpenAI bill” tells you nothing useful on its own. You want to know which feature, which user segment, which session is generating which slice of it.
  • Hunt down idle deployments and ghost endpoints. Fine-tuned model hosting in Azure charges by the hour whether or not anyone uses it. A Vertex AI notebook with a GPU attached keeps billing whether or not it is doing anything. AWS Bedrock provisioned-throughput models keep billing whether or not you call them. It is the same across all three clouds: AI infrastructure rots into ghost charges faster than any other kind of cloud spend, because everyone is scared to delete something they “might need.” Run a sweep every month.

And then there are two things I bolt onto every project that you will not find in any vendor doc, for the simple reason that the vendor docs assume their own pricing is stable, and we have just spent six sections showing that it is not:

  • Pin model versions in the client contract. When we agree a price for an AI feature, the contract spells out exactly which model version that price assumes. If the vendor kills the cheap tier or quietly swaps the underlying model, the price moves with it. No nasty surprises in month six.
  • Abstract the model boundary so swapping is cheap. If your prompt templates and orchestration are so welded to one provider that switching would be a six-month project, you have boxed yourself in. Eighteen months from now you will use that abstraction. I guarantee it.

This is not paranoia, it is how you design for a market where the vendor’s own unit economics do not work. You do not bet your client’s product on the vendor fixing their pricing before your runway runs out.

The party is not over, but somebody is about to get the check

AI is not going anywhere. The good models will keep getting better. The productivity gains for individual developers and individual workflows are real, and I am going to keep using them every day.

But the spreadsheet underneath all of it does not balance, and by now everyone who can read it knows: the vendors, the economists, the people trying to physically build the data centers, and as of last week, GitHub most of all.

So build like you know it too. Make sure the architecture you ship today still stands up when the price doubles. Because it will.

Comments