Predicting the premier league results

Here is the spreadsheet showing the way in which my predictions were made. I hope it is comprehensible, at least for enthusiasts! I discussed this on the Today programme the day before the matches.

The statistical method used is basically the same as we used last year, when we did well and got 9 win/draw/lose correct and 2 exact scores. For each team we work out an expected number of goals that they will score: this is based on the average for a home or away side (1.69 and 1.09, this season, a strong home advantage of over 50%), adjusted by the 'attack strength' of the team and the 'defence weakness' of their opponents. The expected number of goals for each team are then taken as the means of two independent Poisson distributions and the probability of each goal combination calculated. Adding up the relevant probabilities then gives the assessed chances of a home win, draw, or away win.

Last year we used a very simple model for attack strength and defence weakness, based on the total goals scored and conceded during the season. This year we have allowed the attack strength to depend on whether the team is playing home and away - this easiest way to do this is to consider goals scored home and away entirely independently, but we have 'smoothed' the resulting estimates by giving some weight to away goals when estimating home attack strength. (In formal statistical terms, we are fitting an approximate Poisson regression model with main effects for home/away, team and opposing team, and a mixed effect interaction term.)

A real problem occurs when the 'most likely' exact score is a draw, but overall the most likely overall result is a win for one team. In this case we have gone for the most likely overall outcome, although this means we have not predicted any draws.

The final predictions are as follows:

The most likely results for each match, with their % probability.
Home Away Most likely Probability of result Probability Actual result
score (win/draw/lose) exact score
Arsenal Fulham 2-0 73% 15% 4-0
Aston Villa Blackburn 1-0 69% 16% 0-1
Bolton Birmingham 1-0 39% 11% 2-1
Burnley Tottenham 0-2 64% 11% 4-2
Chelsea Wigan 4-0 96% 11% 8-0
Everton Portsmouth 2-0 75% 14% 1-0
Hull Liverpool 0-1 61% 14% 0-0
Man U Stoke 2-0 80% 18% 4-0
West Ham Man C 1-2 55% 10% 1-1
Wolves Sunderland 0-1 37% 15% 2-1

It is important that by adding up the probabilities we can work out how many we expect to get right: 6.5 results and 1.3 exact scores, and anything more than this is luck!

By multiplying the probabilities we can assess the chance that all the predictions will be correct: this comes to around a 1 in 100 chance that all the results will be right, and around 1 in 700 million chance that all the exact scores will be correct. That's why I don't bet on these predictions.

Added at 6pm on Sunday 9th May

A fairly pathetic result. Only 5 results right and no correct exact scores, less than expected but not incompatible with the probabilities given. Just goes to show that uncertainty does not always play out as desired. Mark Lawrenson for the BBC did better: 6 results and 2 exact scores, so he gets his own back for last year when he only got 7 results and 1 exact score. Oh well, back to the day job.

PREMIER-0910-BBC-DJS.xls74.5 KB


Anonymous's picture

Enjoyed your spot on the Today programme this morning. Have you thought of running a season-long shoot-out with Daniel Finkelstein's "fink tank"? PS Your CAPTCHA is too hard for an old man with failing eyesight.
Anonymous's picture

I think the biggest problem with the model proposed above is the independence of goals scored by both teams. What makes football a more exciting to watch sport than tennis, or golf, for example, is that once a team concedes a goal (and this in itself can happen for the most crazy and unlikely reasons, e.g. elementary goalkeeping mistakes, own goals etc.) that team is instantly put at a big advantage, being in the position of only having to sit back and defend to win the game. There also seems to be an effect of goals being scored more easily against a side which has already conceded a goal, i.e. the confidence boost to the attacking team and the "shock" effect on the defending team. Thus a better model would be to condition the result on the first goal scored, although this is clearly not easy to predict at all. In tennis or golf, each set or hole respectively is more or less completely independent of the previous, so, for example, a stroke of luck in getting a hole in one on a particular hole will not significantly help you in winning any major title. Consistency in these sports is then key.
Anonymous's picture

Conditioning the prediction based upon which side scores the first goal sounds like Bayesian Probability to me. Not difficult at all to do. An easier way is to apply the maxim that he who scores first is more likely to win (an alternative would be to say that he who scored the most recent goal wins) and use Spread Betting to net your gains. I do have concerns about the accuracy of predictions algorithms having seen claims for in excess of 50% accuracy amongst academic's models of annual datasets. Repeating the same method on week-by-week data does not result in a regular level of accuracy above 50% on the database system that I have used. However, just applying HOME WINs to all matches results in accuracys of 54% plus. David has pointed out that, with the exception of top-flight clubs, the result of any match owes more to luck than skill. Can anyone confirm whether academic models and predictions are based on calculations of annual results (for goal averages and attack/defense factors)? If that were the case then applying a calculation based on annual factors for any one match would be wrong and the model flawed and inapplicable for week-by-week predictions.
Anonymous's picture

Since first hearing the predictions in 2009 I've played with occasional spreadsheets not dissimilar to yours (perhaps not as colourful!) - googling around a bit recently there seems to be various assessments that suggest a negative binomial distribution forms a better fit to football events than a Poisson distribution (for example - ) Are you going to stick with Poisson for next May, or expand the thinking to an alternative approach?
Anonymous's picture

Hi David You have computed figures for "home attack advantage" and "defence home advantage" in cols I and S in the strengths spreadsheet though these do not appear to be used in the formulas that calculate the weighted strengths and weaknesses. Should these figures used to specify the weight given to away results depending on the teams playing? The weighting of 0.33 appears arbitrary. I cannot see how it is derived. Please forgive me if I have fundamentally mis-understood the spreadsheet but I would be grateful for an explanation as I am currently trying to develop a similar model. I would also be indebted to anyone who can give a simple explanation of how the negative binomial distribution can be used in forecasting the outcome of a football match. In terms of failure and success what exactly is being measured? Goals? The Result Home, Away or Draw? And what probability of a success is used? This too makes little sense to me. Many Thanks.
Anonymous's picture

Hello. I've discovered this just today! (13th April 2011) I dream of being able to mathematically predict the best fantasy football 11 going into the week on yahoo fantasy football. I've been working on a model to predict team scores first, then will move onto players. The problem with this model above is that I don't understand why you apply the blanket home/away advantage from across the league. Why not use the home/away advantage specifically for each team as this would seem to be more accurate. I used this model to predict next weekend's (16th April 2011) La Liga results and got the following: Sat, Apr 16 MGA 1-1 MAL Sat, Apr 16 MAD 1-3 BAR Sat, Apr 16 ALM 0-2 VAL Sat, Apr 16 GET 2-2 SEV Sun, Apr 17 OSA 1-1 BIL Sun, Apr 17 RSO 1-0 SPO Sun, Apr 17 DEP 0-0 RAC Sun, Apr 17 LEV 1-0 HER Sun, Apr 17 ESP 1-1 AMR Mon, Apr 18 VIL 1-0 RZG
Gambler's picture

This is a great article for noobs like me. I was trying to access the link (xls) provided but it seems to be dead. Thanks.
ernest's picture

Like the content of your article! I have been doing some research on calculating fair odds by expected goals, poisson distribution and placing value bets lately and would like to recommend the following articles (for beginners like myself):
Bonnevillet100's picture

Dear David, if the prior probabilities are coming from above Poisson calculations, could they be updated using Bayes' Rule (aka Bayes' Update)? Meaning the new information would be goal difference from past 6 matches, that could be then used as evidence and therefore posterior probability could be calculated using Bayes'. Have you tried running Bayes' model in football predictions, David? My effort is still under construction and long way from being accurate. Results tend to be too home biased. Regards, Bonnevillet100
underjsj's picture - Balenciaga / - サンローラン アウトレット / - ビビアン - ロエベ 財布 メンズ / - ポールスミス バッグ コピー / - サンローラン 財布 / - Mbtシューズ / - ティンバーランド 通販 するため、実際には会社の方針である。実際には、あなたは、クライアントごとにヴィトンの1のタイプの1袋を購フィルタコンデンサ1、2、それにより、リップル電圧を減少させる、等価直列抵抗を低減するために、並列に複数のコンデンサを使用する。3、4、5、6は、コモンモードノイズを除去するために使用され、その値は大きなリーク電流を回避するために大きくなるべきではない。干渉抑制技術の2ソースコンバータは、施策のそれぞれについて、高周波干渉トランス、電源スイッチと整流器の主な情報源である。 - コンバース 靴需品のための場所が必要だった騎手にレザーバッグを販売。馬車は輸送のより近代的な様式に道を譲ったとして、 - Mbt 2013 新作 - ティンバーランド 安いとプラダの組み合わせ努力は携帯電話市場で、新しいデバイス、のプラダを持ってきた。ッチの靴は、唯一つだけあなたのために、おそらく最も有利なオプションの中ですべての可能性の中で最ものよう - ティンバーランド 新作プの一環として、数々の色やパターンを持つことができます。 - マイケルコース 財布 人気スプレイの1よりもはるかに短い時間オンのままです。会話中に、 850のサウンドは素晴らしく、明確であり、我々ルイ·ヴィトンカップはアメリカズカップでの挑戦者として航海するチームを選択するための選択シリーズを使用ファッションは、ファッションの世界で認められた組織であり、その創造的で内気なパリの存在で有名なだけ受け - ヴィヴィアン メンズ 財布 節圧力および、脊柱の受けた衝撃力を軽減することができます。 - コーチ 激安バーバリー製品ラインが単一市場になり全体のデザインが伸びている。あなたが熱心なファッショニスタである場合そして、常に最新のファッショントレンドに向かって近い目を保つには、ブランドの新しいスタイルを発見している必要があります。burberry衣類と異なる機能の付属品には、バーバリーの必要性を兼ね備えて見つけることができますスタイルのこの感覚。デザイナーのバーバリーのハンドバッグは、常に需要との好みのにされている多くの女性。 - ポールスミス 財布 アウトレットしいヴェルサーチの鉱泉のパワーは1978年にジャンニ·ヴェルサーチによって開始され、現在、ドナテラヴェルサ - moUupS7tZ3 GQUodsS9U4 &nbsp&nbsp<a href= > moUupS7tZ3 GQUodsS9U4 </a>