In the months of sporting downtime, I have found myself wondering whether it’s possible to measure how good bettors have to be to beat the bookmaker. As usual, I’m not talking about lucky wins, but long term expected profitability via the principles of value betting.

Obviously, we know that to achieve this we have to overcome the bookmaker’s margin. The fact that such a small proportion of bettors probably manage to achieve this is evidence that it’s a pretty hard task, even when some margins are as small as 2%.

Furthermore, those who have read my articles or follow me on Twitter will be aware of several Excel tools I’ve provided to measure the statistical chances of achieving profitable expected value, and how these performances might be expected to distribute given certain betting and staking assumptions.

However, what I want to investigate this time is exactly what “being better than the bookmaker” looks like from the perspective of uncertainty. Betting, as we know, is an uncertain business, even for the best forecasters. How much less uncertain do we need to be to overcome the margin and become a long-term winner?

Through attempts at simulating an answer, it becomes clear just how good the bookmaker really is, and why they have to be to remain profitable themselves. This is the asymmetry of bookmaking.

**Two types of uncertainty**

In my last two articles for Pinnacle, I’ve discussed two types of uncertainty. Firstly, there is aleatory or statistical uncertainty. This refers to the inherent uncertainty due to probabilistic variability. Repeat a process many times, for example tossing a coin, and subtle variations in starting conditions will render different outcomes. These differences remain unknown. Aleatory uncertainty is irreducible.

There will be some, like Laplace’s demon, who may argue that this is simply a consequence of limited information and processing power. If we could know perfectly all the starting conditions, we should be able to predict with certainty the precise outcomes.

In practical terms, the complexity of these systems reduces such information gathering to the level of the impossible. Perhaps more importantly, however, the probabilistic rather than deterministic nature of reality at tiny scales means that even theoretically this task is doomed to fail.

If the bookmaker goes too long in their assessment of the true odds then there’s a fair chance the bettor will too, and vice versa. Some bettors may also be partially anchored to the bookmaker’s price.

It is precisely for this reason that it makes sense to talk about “true” probabilities of outcomes, and not be fooled into believing those to be deterministically either 0% or 100%. Granted, in sports, unlike in coin tossing or dice rolling we don’t, and can’t, know the “true” probabilities (hence the inverted commas), but it makes sense to talk as if they exist.The second type of uncertainty is epistemic or model uncertainty and arises because of incomplete understanding about what we are trying to model. Epistemic uncertainty can be reduced with additional knowledge about the model.

The goal of uncertainty quantification is to reduce epistemic uncertainties to aleatoric uncertainties, although in practice because of system complexity and the probabilistic nature of reality, the boundary between the two may be fuzzy.

In his brilliant Toward a theory of everything piece for Pinnacle, guest author @PlusEVAnalytics has described these two types of uncertainty respectively as process (aleatory) and parameter (epistemic) uncertainty. By reproducing his example, this will hopefully clarify the difference.

*“Suppose you give a soccer team a 60% probability of winning, you bet on them at even money, and they lose. Why did you lose your bet? Perhaps you were correct in your assessment, but you were unlucky – the 40% event happened, and you lost your bet. This is process (aleatory) uncertainty – good bet, unlucky result. On the other hand, perhaps you were incorrect in your assessment – the true probability may have been 50%, or 30%, or even 1%. You made a bet that you thought was a good bet but in reality, was a bad bet. This is parameter (epistemic) uncertainty. Because the true probability is unknown, it’s very difficult to figure out how much of your results – both good and bad – are driven by process uncertainty as opposed to parameter uncertainty.”*

**Modelling uncertainty in a betting context**

In a betting environment, aleatory uncertainty will be the same for everyone. The same events take place in a sporting contest, with the same influencing variables. Every bettor lives the same history.

It’s easy enough to model aleatory uncertainty by means of a simple random number generator. Suppose we have a 50-50 contest, with fair odds of 2.00. To model aleatory uncertainty, we can just use a random number generator to output numbers between 0 and 1. Below 0.5 and it’s a winning bet. Above, and it’s a losing one. The distribution of outcomes (winning and losing bets) would then follow a binomial one.

Modelling epistemic uncertainty is a little more problematic since it’s not at all obvious what kind of distribution such errors arising from it would look like. @PlusEVAnalytics used a beta distribution to model it, but he’s a lot brighter than I am so I will resort to the laziness of the normal distribution. Furthermore, I will assume that this distribution of epistemic errors is centred about the true outcome probability, as I have described below. Of course, if systematic biases are present that will not be true.

For the bookmaker at least, this is perhaps not an unreasonable assumption since I have already shown that, for larger sporting markets at least, Pinnacle’s betting odds are highly efficient. That is to say that, on average, they reflect very accurately the underlying true outcome probabilities even if individually there will be errors. Whether this is also true for bettors is perhaps less certain.

**Distribution in epistemic uncertainty**

To model the effect of epistemic uncertainty, I have created a series of 1,000 hypothetical bets, where the true probability of winning every bet is 50%. For each bet, my hypothetical forecast model exhibits some epistemic error in its judgment of those true win probabilities, the size of which is determined by six different standard deviations: 0%, 1%, 2%, 3% 4% and 5%. For example, for a standard deviation of 1%, just over two-thirds of modelled “true” win probabilities will be between 49% and 51%, with about 95% between 48% and 52%.

For larger standard deviations, the spread in these “true” win probabilities as produced by the forecast model will be greater, as the chart below illustrates. Obviously with a standard deviation of 0%, all win probabilities would be exactly 50% so the line is not shown. The wider the distribution, the greater the epistemic uncertainty.

It is clear from the chart that whilst each of these win probability distributions represent an efficient model – the mean is always 50% – the amount of epistemic uncertainty is variable.

Inverting the “true” win probabilities gives us the distribution of odds. Because of the inverse relationship between the win probability and the implied odds, they will be lognormally distributed.

In a sample of 1,000 bets this means that modelled "true" odds will typically range between 1.88 and 2.13, 1.78 and 2.28, 1.69 and 2.46, 1.60 and 2.66, and 1.52 and 2.89 for standard deviations of 1%, 2%, 3%, 4% and 5% respectively.

**Bookmaker vs. Bettor**

Let’s use this model of epistemic uncertainty in the true odds to build a contest between the bookmaker and the bettor. For each bet, the bookmaker publishes what they think the true odds are with a 2.5% margin reducing the price. For example, they would publish 1.95 if they thought the true price was 2.00. Over the 1,000 bets those odds will vary according to the distributions above.

The bettor has another model and uses it to estimate what they think the true odds should be. If the bookmaker’s published odds are longer than the bettor’s estimate, the bettor makes a 1-unit bet. If not, there is no bet.

For the purposes of settling the bets, the true odds for every bet, unbeknown to both bookmaker and bettor, are 2.00, and a random number generator is used to determine the outcome. As previously explained, any variance here is the consequence of aleatory uncertainty.

The contest was repeated using a Monte Carlo simulation 10,000 times. First look at the average numbers of bets struck for each of the 36 different bookmaker-bettor epistemic uncertainty pairs. The greater the epistemic uncertainty (shown in the row and column headers), for either bettor or bookmaker, the more likely it is that the difference between the two models will be greater than the size of the margin, hence the more likely it is that a bet will be struct.

Obviously where both the bookmaker and bettor are perfect, it is impossible for any bets to take place since the bookmaker will always be publishing 1.95 and the bettor will always know that to be shorter than the true price.

The second table shows the average (expected) yields the bettor managed to achieve for each uncertainty pair. Remember, the smaller the standard deviation, the lower the epistemic uncertainty and the better the model.

Unsurprisingly, when the bookmaker is perfect and forecasts the probability of every bet exactly, then no matter how good the bettor they will lose an amount equivalent to the size of the margin (-2.5%). The slight variability around this number is simply consequence of aleatory uncertainty. A bigger Monte Carlo simulation would reduce this.

Also notice that where the bettor’s model is better (less uncertain) than the bookmaker’s, it’s enough to generate an expected profit. But there is also something quite puzzling that is apparent. When the bookmaker’s model is poor, the bettor can still make an expected profit even if their model is worse. For example, if the bookmaker’s epistemic uncertainty has a standard deviation of 3% in the win probability, the bettor can still expect to make +0.68% where their model has an uncertainty standard deviation of 5%. This seems to make no sense at all.

**The asymmetry of bookmaking**

To solve this paradox, we must look at how this contest has been constructed. As for any betting market, the bookmaker will lay a price. The bettor must then decide whether they will accept the challenge, doing so only if the published odds are longer than their own estimated “true” odds. Should that happen, the bookmaker doesn’t get a chance to retract the offer of the bet.

In my model scenario, if any epistemic uncertainty is present, 50% of the bookmaker’s errors will predict “true” odds which are longer than the real true odds (of 2.00) and 50% shorter. Similarly, 50% of the bettor’s errors will be either longer or shorter than 2.00.

We can’t ever know what the bookmaker really believes the “true” odds of their markets to be. Neither can we know what the real true odds of those estimates are.

However, when the bookmaker’s odds are shorter than 2.00, there is less opportunity for the bettor’s odds to be shorter still, thus triggering a bet. Conversely, there is more opportunity for a bet to be triggered when the bookmaker’s odds are too long.This asymmetry leads to a greater proportion of positive expected value bets versus negative ones. The greater the epistemic uncertainty, the greater the asymmetry. When both bookmaker and bettor show a 2% standard deviation in the uncertainty, 56% of the bets have positive expected value, and the average odds bet are 2.01. When the standard deviation in uncertainty rises to 5% for both, 68% of the bets are struck at odds over 2.00 with an average of 2.10.

If, instead, we run a different model where both backer and layer have to mutually agree whether to engage in a bet at odds published by a third party, this asymmetry largely disappears. They are then both also competing against the third party and their model epistemic uncertainty. If that third party’s epistemic uncertainty is small, then both backer and layer, if they have equally uncertain models, will be both losing the equivalent of the margin set by the third party.

**The real world of betting**

All of these conclusions hinge upon one major and probably unrealistic assumption. It has been assumed that the bookmaker’s and bettor’s models are completely independent of each other. In reality this is unlikely to be the case since modellers tend to be using similar data and similar forecasting algorithms.

If the bookmaker goes too long in their assessment of the true odds then there’s a fair chance the bettor will too, and vice versa. Some bettors may also be partially anchored to the bookmaker’s price.

Any model correlation between bookmaker and bettor will reduce the bettor’s expected value and make it harder for them to succeed.

Nevertheless, this model of epistemic uncertainty offers a clue as to how good a typical bookmaker has to be to be able to remain profitable, even with a margin on their side. Since bookmakers are unable to retract odds once the bettor has accepted them, they have to be sure that they have reduced their epistemic uncertainty to a minimum.

We can’t ever know what the bookmaker really believes the “true” odds of their markets to be. Neither can we know what the real true odds of those estimates are. Hence, we can’t precisely determine what epistemic uncertainty exists.

However, we can make an educated guess if we assume that a bookmaker’s closing odds (with margin removed) represent the real true odds. Then, the difference between pre-closing and closing odds will provide a measure of how much model error existed.

Taking a set of over/under pre-closing and closing Pinnacle odds for this season’s English soccer matches, removing the margin and standardising the closing odds to 2.00, the standard deviation in win probability assumed by the pre-closing odds is a shade over 2%. This is towards the low end of my modelled standard deviations and reveals again that Pinnacle’s model is pretty effective at minimising epistemic uncertainty.

To beat it, bettors will need to be at least as good, and that’s with the asymmetry on their side. If Pinnacle’s epistemic error in match-ups with 50% win probability is only 2%, there isn’t a whole lot of room for improvement by the bettor. Obviously, bettors can increase things in their favour by applying minimum expected value thresholds before triggering bets against Pinnacle. But any model correlation will make the task harder.

Once against we have shown that beating a bookmaker, and in particular Pinnacle, is not an easy task. And now we have another reason why. As the layer of odds, they don’t have the benefit, like their customers, of picking and choosing what might be a good bet. They have to stick their neck on the line every time and hope they’ve got it right. For Pinnacle, minimising epistemic uncertainty and maximising odds efficiency is the name of the game.

## Bookmarks