Predicting the premier league results

david's picture
in

Here is the spreadsheet showing the way in which my predictions were made. I hope it is comprehensible, at least for enthusiasts! I discussed this on the Today programme the day before the matches.

The statistical method used is basically the same as we used last year, when we did well and got 9 win/draw/lose correct and 2 exact scores. For each team we work out an expected number of goals that they will score: this is based on the average for a home or away side (1.69 and 1.09, this season, a strong home advantage of over 50%), adjusted by the 'attack strength' of the team and the 'defence weakness' of their opponents. The expected number of goals for each team are then taken as the means of two independent Poisson distributions and the probability of each goal combination calculated. Adding up the relevant probabilities then gives the assessed chances of a home win, draw, or away win.

Last year we used a very simple model for attack strength and defence weakness, based on the total goals scored and conceded during the season. This year we have allowed the attack strength to depend on whether the team is playing home and away - this easiest way to do this is to consider goals scored home and away entirely independently, but we have 'smoothed' the resulting estimates by giving some weight to away goals when estimating home attack strength. (In formal statistical terms, we are fitting an approximate Poisson regression model with main effects for home/away, team and opposing team, and a mixed effect interaction term.)

A real problem occurs when the 'most likely' exact score is a draw, but overall the most likely overall result is a win for one team. In this case we have gone for the most likely overall outcome, although this means we have not predicted any draws.

The final predictions are as follows:

The most likely results for each match, with their % probability.
Home Away Most likely Probability of result Probability Actual result
score (win/draw/lose) exact score
Arsenal Fulham 2-0 73% 15% 4-0
Aston Villa Blackburn 1-0 69% 16% 0-1
Bolton Birmingham 1-0 39% 11% 2-1
Burnley Tottenham 0-2 64% 11% 4-2
Chelsea Wigan 4-0 96% 11% 8-0
Everton Portsmouth 2-0 75% 14% 1-0
Hull Liverpool 0-1 61% 14% 0-0
Man U Stoke 2-0 80% 18% 4-0
West Ham Man C 1-2 55% 10% 1-1
Wolves Sunderland 0-1 37% 15% 2-1

It is important that by adding up the probabilities we can work out how many we expect to get right: 6.5 results and 1.3 exact scores, and anything more than this is luck!

By multiplying the probabilities we can assess the chance that all the predictions will be correct: this comes to around a 1 in 100 chance that all the results will be right, and around 1 in 700 million chance that all the exact scores will be correct. That's why I don't bet on these predictions.

Added at 6pm on Sunday 9th May

A fairly pathetic result. Only 5 results right and no correct exact scores, less than expected but not incompatible with the probabilities given. Just goes to show that uncertainty does not always play out as desired. Mark Lawrenson for the BBC did better: 6 results and 2 exact scores, so he gets his own back for last year when he only got 7 results and 1 exact score. Oh well, back to the day job.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Independence assumptions

Anonymous's picture
I think the biggest problem with the model proposed above is the independence of goals scored by both teams. What makes football a more exciting to watch sport than tennis, or golf, for example, is that once a team concedes a goal (and this in itself can happen for the most crazy and unlikely reasons, e.g. elementary goalkeeping mistakes, own goals etc.) that team is instantly put at a big advantage, being in the position of only having to sit back and defend to win the game. There also seems to be an effect of goals being scored more easily against a side which has already conceded a goal, i.e. the confidence boost to the attacking team and the "shock" effect on the defending team. Thus a better model would be to condition the result on the first goal scored, although this is clearly not easy to predict at all. In tennis or golf, each set or hole respectively is more or less completely independent of the previous, so, for example, a stroke of luck in getting a hole in one on a particular hole will not significantly help you in winning any major title. Consistency in these sports is then key.

Footy

Anonymous's picture
Enjoyed your spot on the Today programme this morning. Have you thought of running a season-long shoot-out with Daniel Finkelstein's "fink tank"? PS Your CAPTCHA is too hard for an old man with failing eyesight.

Post new comment

  • Lines and paragraphs break automatically.
  • Glossary terms will be automatically marked with links to their descriptions. If there are certain phrases or sections of text that should be excluded from glossary marking and linking, use the special markup, [no-glossary] ... [/no-glossary]. Additionally, these HTML elements will not be scanned: a, abbr, acronym, code, pre.
  • Allowed HTML tags: <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd><b><i><u>

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image without spaces.