One game to play!

With one game each to play in the Premier League, we've updated our football animation to the new season's results.


You need to install the Adobe Flash Player to see the animation.


May 22nd 2009.

The Premier league has one match left to play, with West Brom at the bottom with 31 points and Man U at the top with 87.

Part of the spread of points is due to chance: even if the teams were exactly equal, and each match ended up as a home win, draw, or away win with chances 45%, 26%, or 29% (these are the percentages of home wins, draws, and away wins that have occurred so far this season), then by chance alone one team would top the league, another team would be bottom of the league, and, following a simple calculation, we would expect most of the teams to get between 35 and 66 points. The animation's 'theoretical distribution' shows the expected spread if games were decided randomly using the figures 45%, 26%, and 29%. These percentages would roughly be replicated if, for example, the referee flips a coin and declares a home win if heads turns up, and if tails turns up then the referee flips again and declares an away win if heads turns up, and a draw otherwise. This would save a lot of time and money, but it would not be a great spectator sport.

The actual spread is much larger than that, and we can say that 22% of the variability is due to chance and 78% due to genuine differences between the teams. This is a low contribution of chance, comparable with that in Greece and Turkey where the leagues contain a wide range of talent. Some leagues, in contrast, have contained teams of essentially equal ability where the league positions at the end of the season could be totally attributable to chance: for example the Scottish 2nd Division in 2002-2003 in which after 36 games each the teams all finished between 36 and 59 points: poor Cowdenbeath were at the bottom but the points show that they were really no worse than any other team, just the unluckiest.

But is Manchester United really the best team? Does West Brom deserve to be relegated? We can simulate what would happen if the league continued for ever and find there is a 56% chance Man U really is the best team, and 27% probability that Liverpool would end up top of the league if they went on and on playing.

We can do the same at the bottom too: Sunderland and Hull are currently not in the relegation zone, but there is a 34% and 38% probability they deserve to be there.

If you want to know more about the maths of these calculations, have a look at our pages on previous Premier League seasons.

So who is going to win this weekend?

We can use a simple statistical model to assess the chances of any particular result for all the matches to be played this weekend. The table below shows the current league on May 22nd 2009, with goals for and goals against. The average number of goals scored, and conceded, is 46. If we divide the number of goals scored by 46, we get a measure of 'attack strength', so Arsenal's 1.39 shows they have scored 39% more goals than average. If we divide the number of goals conceded by 46 we get a measure of 'defence weakness', so Stoke City's 111 shows they have let in 11% more goals than average.

Premier League on May 22nd 2009 after 37 games each: 'attack strength' = 'goals for'/46, ' 'defence weakness' = goals against/46.
Team Points Goals for 'Attack strength' Goals against 'Defence weakness'
Man United 87 67 1.46 24 0.52
Liverpool 83 74 1.61 26 0.57
Chelsea 80 65 1.41 22 0.48
Arsenal 69 64 1.39 36 0.78
Everton 60 53 1.15 37 0.80
Aston Villa 59 53 1.15 48 1.04
Fulham 53 39 0.85 32 0.70
Tottenham 51 44 0.96 42 0.91
West Ham 48 40 0.87 44 0.96
Man City 47 57 1.24 50 1.09
Stoke 45 37 0.80 51 1.11
Wigan 42 33 0.72 45 0.98
Bolton 41 41 0.89 52 1.13
Portsmouth 41 38 0.83 56 1.22
Blackburn 40 40 0.87 60 1.30
Sunderland 36 32 0.70 51 1.11
Hull 35 39 0.85 63 1.37
Newcastle 34 40 0.87 58 1.26
Middlesbrough 32 27 0.59 55 1.20
West Brom 31 36 0.78 67 1.46

We also need two other pieces of information: the average number of goals scored by a home team is 1.36, while for an away team it's 1.06.

Now suppose we want to predict the result of Hull vs Manchester United. We start by estimating how many goals Hull will score. They are playing at home, so in an average match we expect them to score 1.36. But this is not an average match: over the season they have scored only 85% of the average number of goals, and so their 'attack strength' is 0.85. Multiplying up we get 1.36 x 0.85 = 1.16. And their opposition is not average either: their defence weakness is 0.652, since they have conceded only 52% of the average. So we get a total of 1.36 x 0.85 x 0.52 = 0.60 expected goals by Hull, which does not look too good.

For Manchester United, their baseline is 1.06, the average number of goals scored by an away team. But by the time we adjust this for Man U's attack strength and Hulls' defence weakness, we get 1.06 x 1.46 x 1.37 = 2.12.

But, just like nobody has 2.4 children, nobody scores 2.12 goals - this is only an expected value, the average if the match were played again and again, heaven forbid. But we can use the Poisson distribution to distribute 100% of probability across the possible number of goals, which gives the probability distributions shown in the table below.

Team Expected goals 0 1 2 3 4 5
Hull City 0.60 55 33 10 2 0 0
Man U 2.12 12 25 27 19 10 4
% probability of each team scoring a specified number of goals in the match on May 24th 2009, using a simple Poisson model.

So, if next Sunday's match follows past performance, there is a 55% probability that Hull won't score at all, and 63% (100 - 25 -12) probability Man U will get at least 2 goals, even though playing away.

To get the probability of an actual result we might assume the goals scored by each team are independent, in the sense that if we knew how many Man U scored, it would not give us any additional information about Hull's performance. This is a strong assumption and we'll come back to it in a moment, but it means that to find, for example, the probability of a 0-2 result, which is the most likely outcome, we multiply 55% by 27% to get 15% - so even the most likely result is still not very likely!

In fact there tends to be some correlation between teams' results, in that matches have some tendency to be either high or low scoring, which we can call a pitch effect. Estimating probabilities allowing for correlations is more complicated and requires special software: the Bivariate Poisson model is popular and can be fitted using free programmes. Yin-Lam Ng, in her Cambridge MPhil in Statistical Science project, has fitted models to all major league results in Europe over the last 20 years, and the predictions below for the matches next weekend are based on the best model found.

% probability of each result for the final matches of the Premier league, based on a Bivariate Poisson model.[Added on May 23rd, actual result in bold]
Home Away Home win Draw Away win
Arsenal Stoke 72 19 10
Aston Villa Newcastle 62 21 17
Blackburn West Brom 54 23 23
Fulham Everton 35 35 30
Hull Man United 9 19 72
Liverpool Tottenham 72 20 9
Man City Bolton 59 22 19
Sunderland Chelsea 10 25 65
West Ham Middlesbrough 57 28 15
Wigan Portsmouth 44 32 25

There are a couple of points to emphasise:

  • These statistical models assume that past performance predicts future results, and do not take into account new factors. For example, Hull City are trying to avoid relegation, Manchester United are conserving their strength having already topped the league, and so Hull City may stand a much better chance of winning than the 9% we have given them - some people obviously think so, as the odds offered by the bookies are more like 2 to 1 against, or a 33% chance.
  • These types of models have been refined over the years and are now used by bookies and sports betting companies, who employ experienced statisticians and make use of the latest computational methods. And, not surprisingly, they don't tell anyone exactly what they do!
  • One thing you can bet on - simple models like those above will be very unlikely to out-perform the odds being offered by bookies, so don't use them to spot good bets!

Below is a table of the 4 most likely results for each match according to the statistical model. Note that the highest chance is 20%, and for most matches there's only around 50% chance that any of these results occur.

The 4 most likely results for each match, with their % probability according to a Bivariate Poisson model.
Home Away Most likely 2nd most likely 3rd most likely 4th most likely Actual result
Arsenal Stoke 2-0 (14%) 1-0 (13%) 2-1 (9%) 3-0 (9%) 4-1
Aston Villa Newcastle 1-0 (10%) 2-0 (10%) 2-1 (10%) 1-1 (10%) 1-0
Blackburn West Brom 1-1 (10%) 2-0 (10%) 2-1 (10%) 1-1 (10%) 0-0
Fulham Everton 0-0 (19%) 1-0 (16%) 0-1 (14%) 1-1 (13%) 0-2
Hull Man United 0-2 (14%) 0-1 (14%) 1-2 ( 9%) 1-1 ( 8%) 0-1
Liverpool Tottenham 1-0 (16%) 2-0 (15%) 3-0 (10%) 2-1 ( 9%) 3-1
Man City Bolton 2-1 (10%) 1-1 (10%) 1-0 (10%) 2-0 (10%) 1-0
Sunderland Chelsea 0-1 (20%) 0-2 (15%) 0-0 (13%) 1-2 ( 8%) 2-3
West Ham Middlesbrough 1-0 (19%) 0-0 (14%) 2-0 (13%) 1-1 (11%) 2-1
Wigan Portsmouth 1-0 (17%) 2-0 (14%) 0-0 (11%) 1-1 (10%) 1-0

The 'most likely' results were given on the BBC More or Less website before the matches were played [we mistakenly gave the Fulham-Everton most-likely 0-0 prediction with probability 10% instead of 19%, and Liverpool-Tottenham's with probability 10% instead of 16%.]

Using the 'best predictions', we got 9/10 correct results, in terms of win/draw/lose, plus 2 exact scores!

Mark Lawrenson, the official BBC football expert, only got 7 correct results, and only 1 exact score.

This is a very good result for statistics! But perhaps a bit lucky....

Levels: 
Free tags: 

Comments

Anonymous's picture

It indeed was impressive. Shows the power of mathematics. Hope you guys publish such statistics for each EPL games. That will be awesome.
Anonymous's picture

If you perform the analysis a significant number of times you should be able to eliminate luck and understand the real power of the model. Not for one round, but for a season...