AI Models Struggle to Beat Bookies Over a Premier League Season

by Roy Brindley, 17 April 2026

Eight advanced AI models lost money betting on a simulated English Premier League season.

Sports Data Visualisation of Soccer Analytics Showing Player Performance and Match Statistics

Eight AI models failed to outperform bookmaker-implied probabilities when put to the test. © Getty Images

Key Facts:

Eight advanced AI models failed to beat the bookies betting on 2023-24 Premier League results.
The best performing AI system lost 11% of its bankroll over the season.
Two systems went bankrupt before the season ended.

Artificial intelligence may be changing the world, but a recent study suggests the technology is far from ready to beat the bookies. In fact, following the advice of some AI systems from leading technology companies could be a fast track to financial ruin.

That is the summary of findings from a recent study by London-based start-up General Reasoning. It simulated a full English Premier League season (2023-24) and asked eight advanced AI models – including systems developed by Google, OpenAI and Anthropic – to make bet selections using historical data and evolving match information.

Each model was given a virtual bankroll and instructed to maximise returns while managing risk over time. Despite access to extensive datasets – including past results, team statistics and betting odds – none of the systems ended the season in profit. In fact, several went bankrupt during the simulation.

AI Digs Two £100,000 Holes

The AI models had been fed every piece of information that quantitative bettors normally rely on, historical match results, player and team statistics, plus additional factors such as weather, travel distance and even stadium altitude.

The AI models had two markets and five potential match outcomes to choose from: home win, away win, or a draw, plus an option to bet over or under 2.5 match goals. Odds were primarily taken from a top UK online bookmaker.

However, despite access to extensive datasets, all of the trialled systems ended with a negative balance. Two went broke before the end of the season, losing the theoretical £100,000 they started the challenge with.

11% Loss Claude Opus 4.6 Wins the League

The strongest-performing model was Anthropic’s Claude Opus 4.6. While it was never in positive territory over the nine-month season, its closing balance was £89,035. At no point did its bankroll drop more than 15% of its starting point.

Similarly, OpenAI GPT-5.4 was never in profit after the start of October (approximately seven weeks into the season). However, its bankroll never dipped below £80,000 and it ultimately finished a close second, ending the season with £86,365.

Google Gemini Flash 3.1 LP got the season off to a flier with its bankroll surpassing £250,000 within the first three weeks. However, by the start of October, it entered negative territory and never recovered, eventually finishing with £41,605.

It was Arcee Trinity that suffered the most spectacular loss. Dropping almost £60,000 on week one bets (in mid-August), it had blown its £100,000 by the third week of September. The second AI model to suffer a complete loss was AI Grok 4.20. Its decline was more gradual. Its money was gone a month shy of the season’s end.

Fault Lies With AI’s Failure to Execute Strategies

General Reasoning’s study, titled ‘KellyBench: Can Language Models Beat the Market?’ reports that the two strongest models, Opus 4.6 and GPT-5.4, share several traits.

Both models retrained or adjusted their strategies in response to new match data, deployed systematic staking rules, and preserved capital during periods when their strategies identified no edge.

In its conclusion, the study states: “The benchmark exposes failures not only in machine learning modelling, where models struggle to outperform bookmaker-implied probabilities, but more fundamentally in the closed-loop reasoning required for long-horizon sequential decision-making.”

“Models can write sophisticated code, diagnose their own failures, and articulate correct strategies, yet persistently fail to execute those strategies reliably, monitor their own performance, or adapt when their approach is not working.”

“As well as looking at performance, we judged strategy sophistication for each model and found existing models to have unsophisticated strategies compared to human approaches.”

“In particular, rich player-level data available in the environment was almost universally ignored in favour of simpler team-level features, suggesting that current models systematically underinvest in data and feature engineering when operating autonomously.”

Roy Brindley Author and Casino Analyst

About the Author

He firstly took up playing poker professionally - during which time he won two televised tournaments, became an author and commentated for many TV stations on their poker coverage. Concurrently he also penned columns in several newspapers, magazines and online publications. As a bonus he met his partner, who was a casino manager, along the way. They now have two children.

AI Models Struggle to Beat Bookies Over a Premier League Season

AI Digs Two £100,000 Holes

11% Loss Claude Opus 4.6 Wins the League

Fault Lies With AI’s Failure to Execute Strategies

Similar Posts