← The MonexusOpinion

The LLM-Trading Result Nobody in Silicon Valley Wants to Talk About

A new study finds LLM-driven trading strategies failed to beat a simple buy-and-hold over two decades. The result is awkward for an industry that has spent two years selling the opposite story.

By Moemedi Michael Poncana·americas·5-minute read·25 Jun 2026·Live on the wire ↗

On 25 June 2026, a study circulated that, taken at face value, should embarrass an entire sub-industry. Across two decades of backtested market data, large language model-based trading strategies mostly failed to outperform a simple buy-and-hold approach, according to the summary posted by the Polymarket news desk at 19:51 UTC. The framing is measured — "mostly failed" is doing real work — but the implication is sharp. A technology marketed as a generational leap in financial decision-making produced, on the evidence presented, no durable edge over the most boring strategy in retail finance.

This is not a Luddite argument. It is a structural one. The AI-trading pitch has rested, from the start, on the claim that machine intelligence can extract signal from noise at a scale and speed no human team can match. If two decades of backtesting cannot validate that claim, the burden of proof shifts — and the people selling the pitch are the ones who now have to produce it.

What the study actually says

The headline finding is straightforward. LLM-based strategies, when run against historical market conditions spanning twenty years, did not reliably beat a passive buy-and-hold benchmark. "Mostly failed" leaves room for narrow windows of outperformance and for specific market regimes where the models added value, but the central tendency is the central tendency. Twenty years is long enough to span multiple cycles, regime changes, and liquidity regimes. It is the kind of test that an honest pitch for AI trading should welcome.

It is also worth noting what the study does not settle. Backtested performance is not forward-looking performance. Markets evolve. The conditions that produced the result may not be the conditions an LLM strategy faces next year. The honest reading is that the marketing has outrun the evidence — not that the technology is permanently useless.

The second signal from the same day

The same news cycle carried a quieter but related datapoint. At 17:51 UTC on 25 June, Polymarket reported that OpenAI's Codex now accounts for 99.8% of weekly AI output tokens inside the company. Read the two items together and the picture sharpens. Inside the firm most identified with consumer-facing AI, internal usage has consolidated around a single coding-oriented model. The general-purpose frontier — the version of AI sold to the public as a universal reasoning engine — has, inside its own house, been narrowed to a tool.

That matters for the trading claim because the trading claim leans on exactly the general-purpose framing. An LLM that can write poetry, draft a brief, and translate a contract is supposed, by the marketing, to also be able to read a balance sheet and decide when to sell a position. If the leading lab has effectively concluded that specialised models do the actual work better, the trading thesis — which depends on versatile reasoning applied to messy, multi-domain inputs — is built on a premise the labs themselves are quietly walking away from.

What is actually being sold

The AI-trading product, looked at coldly, is mostly two things packaged together. The first is access to compute and models that retail investors could not previously afford. The second is a narrative — that intelligence, once applied to markets, produces alpha. The first is real and probably worth paying for. The second is what the new study puts pressure on.

A more skeptical reading would say that what is being sold is not alpha at all. It is the feeling of alpha. The product gives the customer a sense of engagement, a stream of decisions to make, a feeling of participation in something technical. The buy-and-hold alternative, by contrast, asks for patience and produces boredom. A product that delivers dopamine reliably will beat a strategy that delivers returns slowly, regardless of the backtest — at least until the customer checks the statement.

The structural frame

The deeper pattern here is familiar. A new technology arrives, attracts capital on a thesis of productive transformation, and spends several years producing tools that are useful but not transformational. The internet did this. Crypto did this. Cloud computing did this. The pattern is not failure; it is the gap between expectation and delivery during the build-out phase. The AI-trading variant is unusual only in how loud the pitch has been, and how visible the gap has become in real time.

There is also a media-governance layer worth naming. Coverage of AI in finance has tended to defer to the language of the firms selling the products. Demos are reported as results. Pilot programmes are reported as deployments. A backtested curve is reported as a forecast. The new study is the kind of counter-evidence that, in a healthier information environment, would have been headline material the moment it appeared. Instead it competes for attention with product announcements and funding rounds, all framed in the same vocabulary of inevitability.

What remains uncertain

The honest caveats matter. The study covers two decades, but markets evolve, and a strategy that failed to beat buy-and-hold through 2025 may behave differently in a regime shaped by AI-driven flows themselves. The summary does not specify which LLMs were tested, which data sources were used, or whether transaction costs and slippage were modelled. "Mostly failed" is also a softer claim than "failed," and the difference matters for anyone deciding whether to deploy capital on the back of it.

What can be said with more confidence is this: on 25 June 2026, the public case for AI-driven trading got harder to make on the strength of backtests alone. The labs building the models are, internally, voting with their compute for specialisation over generality. The macro backdrop — US Q1 GDP revised to 2.1% growth at 16:15 UTC the same day, according to the same wire — is the kind of environment in which boring strategies tend to do fine. The next chapter of this debate will be written by whoever produces the next clean piece of evidence. Until then, buy-and-hold remains, as it has been for some time, the strategy that does not need a press release.

Desk note: Monexus has framed this as a structural story about the gap between AI marketing and AI performance, rather than as a pure markets piece. The trading-floor hero is intentional — the article is about what happens when the people inside that building stop being the smartest actors in the room.

Wire provenance

This editorial synthesis draws on the following public wire/social posts:

https://x.com/polymarket/status/...
https://x.com/polymarket/status/...
https://x.com/polymarket/status/...
https://x.com/polymarket/status/...