← The MonexusLong-reads

A $2 AI: How China's Token Pricing Is Rewriting the Inference Economy

Beijing's regulator is rewriting refinancing rules at the same moment Chinese inference prices collapse to a fraction of US levels. The two stories are connected, and they point at the same target: the cost of running the model.

By Monexus Staff Writer·asia·8-minute read·3 Jul 2026·Live on the wire ↗

A dark green graphic displays the text "LONG READS" in large cream-colored letters, with "MONEXUS NEWS" in the top right and "No photograph on file. Article available below." at the bottom. Monexus News

On the morning of 3 July 2026, China's securities regulator published proposed changes to the refinancing rules that govern how listed companies raise fresh capital from equity and convertible-bond markets. The timing was almost certainly not coincidental. By the same evening, an analyst note from UBS — circulated widely through market-data terminals and excerpted by Unusual Whales — had put a number on something investors had been whispering about for months: certain Chinese frontier AI models are now priced at $2 to $3 per million output tokens, against roughly $15 for comparable American systems. The two stories, a regulatory move in Beijing and a pricing fact in a Swiss bank's research note, sit on top of each other. Read together, they describe the same industrial bet: that the next phase of the AI race will be decided not by who trains the largest model, but by who can run the cheapest one.

The bet is a serious one, and it deserves to be taken seriously. The Western default assumption since the release of GPT-4 has been that training-compute scale — clusters of cutting-edge accelerators, hundreds of millions of dollars of training runs, frontier-lab research bench depth — sets the ceiling for what AI can do. The Chinese counter-assumption, articulated less in op-eds than in product roadmaps and equity filings, is that scale at training is necessary but not sufficient; what matters commercially is the marginal cost of a token at inference. If a Chinese model can answer a million prompts for $2 while a US rival charges $15 for the same workload, the cost arbitrage eventually overwhelms architectural advantages at the training edge. That is the argument the market is now pricing.

The regulator's move

China's refinancing rules for listed companies have been a quiet but consequential lever of industrial policy for the better part of a decade. A listed company that wants to issue new shares, raise a convertible bond, or do a private placement has to navigate a separate set of approvals, disclosures and use-of-proceeds restrictions. On 3 July China's main securities regulator proposed changes to those rules. The Reuters dispatch distributed that morning did not have the full text, but the direction of travel is well understood inside Chinese financial press: Beijing has, in successive revisions since 2023, loosened the constraints on tech-sector refinancing in particular, on the theory that capital should flow more easily toward firms the state has identified as strategically important — semiconductor fabricators, EV and battery champions, AI labs.

That direction cuts both ways. Critics in Western financial press have argued that loosening refinancing rules for Chinese tech firms amounts to state-directed subsidy, with all the trade-distortion baggage that phrase carries. The counter-argument from Chinese policy commentary, articulated in outlets such as the South China Morning Post and Xinhua's financial desk, is that the US likewise channels cheap capital to its frontier sectors through export financing, public procurement contracts, Inflation Reduction Act-style tax credits, and a deep institutional bench — pension funds, university endowments, sovereign-grade allocators — that does not exist at the same scale in China. The structural claim is that no major economy runs frontier industry on pure private capital; the question is which set of distortions one prefers. The proposed refinancing changes move Beijing one click closer to the US pattern, not away from it.

What the proposed changes will mean, in practice, is that Chinese AI labs and their listed backers will have an easier time raising equity follow-ons in the second half of 2026 than they did in 2024. UBS's pricing observation only amplifies the demand side of that: if tokens are cheap, the addressable market for inference is enormous, and the listed entity that monetises a slice of it has a defensible revenue line.

The token price and what it does

The UBS note is the more striking of the two stories. Per the analysis, certain Chinese AI models cost as little as $2 to $3 per million output tokens. That compares with around $15 per million for comparable US models, on the inference side. The implication is not that Chinese models are six to seven times worse; the implication is that for a given workload — say, a customer-service chatbot fielding ten thousand conversations a day, or a financial-research tool summarising filings — the bill at the end of the month is roughly one-fifth to one-seventh what it would be on a US competitor.

The Western reaction, where it has surfaced, has clustered around two claims. The first is that the price gap reflects subsidy: Chinese models are loss-leading because someone is paying the difference. The second is that the price gap reflects a quality gap, that the cheapest model is cheaper because it is a smaller, less capable model. Both claims are plausible. Neither is provable from public data alone. The Chinese reply, in industry forums and on platforms such as Weibo, has been that US prices reflect a training-cost amortisation regime that Chinese entrants, having entered later, do not have to replicate. The structural point is that inference pricing is a function not only of compute input cost but of what the firm has to recover from each token. A late entrant without a sunk training bill can price at marginal cost. An incumbent with a $5 billion training bill has to price at average cost or fail.

That structural point does not make the Chinese position automatically correct. It does make the pricing gap legible without invoking subsidy as the explanans.

How the prediction market reads it

Polymarket, the event-derivative venue, ran a market during late June and early July asking whether a Chinese company would have the number-one ranked AI model by the end of 2026. As of the snapshot circulated on 2 July, the implied probability was 11%. That is low — not 50, not 30, not 20. Eleven percent. But it is not zero. Eleven percent is the price the market places on something the consensus has spent two years saying cannot happen. It is also, by Polymarket's own market microstructure, a market that takes mostly thin liquidity; read it as a sentiment-gauge rather than as a probability engine.

It is worth lingering on what number one means in this context. The dominant Western benchmarks — the leaderboards that pop into inboxes each month — are dominated by US labs. A Chinese firm reaching the top of one of those boards would be a milestone, but the market would also need to ask: top of which benchmark, on which axis, at what time of the year. Benchmarks saturate; what was a six-month gap in 2024 can be a six-week gap in 2026. The gap matters less than the price of catching up.

The structural read

Industrial-policy literature has spent two decades arguing about whether scale or agility wins. The current AI cycle is a partial answer: scale wins at training, agility arguably wins at inference, and the two are governed by different cost structures. The training side is governed by access to cutting-edge accelerators, where the US and a small group of allies have spent the last three years constructing export-control regimes. The inference side is governed by energy, by chip-generation mix at the deployment edge, by software optimisation, and by the cost structure of the firm doing the serving. The Chinese side is well placed on three of those four inputs and is improvising aggressively on the fourth. The US side is well placed on chip generation, but is structurally constrained on energy and is burdened with an amortisation cost that Chinese rivals do not carry.

The misreading to watch for is the one that treats cheap tokens as somehow cheating. They are not cheating; they are the product of an industry that is, finally, becoming a normal industry. The car industry has cheap seats; the cloud industry has cheap storage; the AI industry is about to have cheap tokens. The policy question is what that does to the labour market, the energy grid, and the geopolitics of compute. The trade question is whether frontier-lab margins hold up in a world where tokens are a commodity. The political question is whether the US will adjust its industrial-policy posture in response, or double down on training-side advantages and accept a smaller total share of inference.

The Chinese position, articulated through MFA briefings and the global editions of its state press, has been consistent for at least eighteen months: that export controls on advanced accelerators will not stop Chinese progress, will accelerate indigenous substitution, and will produce a separate inference economy in which Chinese models dominate. The accuracy of that prediction is not yet in. The fact that the pricing data is now backing parts of it is what this article is taking note of.

Stakes

The 2026 second half will be the period in which the pricing arbitrage either compounds or collapses. If Chinese inference prices stay at one-fifth of US prices and quality continues to converge, the inference market becomes the analogue of solar panels and EV batteries: low-margin, high-volume, and dominated by whichever jurisdiction can produce at the lowest marginal cost. The US frontier labs would then face a choice: cut prices to retain share, or retreat upmarket to the workloads where price elasticity is genuinely low. Either path has strategic consequences. A retreat upmarket preserves margins but cedes the volume market to Chinese systems; a price cut preserves share but signals to investors that the moat at inference is narrower than the moat at training.

For Chinese regulators, the refinancing rule change is the supply-side complement to that demand-side shift. Cheaper tokens need cheaper equity to scale. The proposal, if it passes, will tighten that link. For investors globally, the question is no longer whether the AI cycle is real but which curve they are on. The curve that matters now is the inference curve, and on that curve, as of 3 July 2026, Chinese models are priced at a fraction of US rivals. That is the news.

Monexus read this against wire coverage of the regulator's refinancing proposal and UBS's pricing note on token cost. The framing — that the two belong in the same article — is the publication's own.

Wire provenance

This editorial synthesis draws on the following public wire/social posts:

http://reut.rs/4eVwEOC
https://unusualwhales.com/news/fda-approves-philip-morris-zyn-reduced-risk
https://t.me/NikkeiAsia