Skill or Chance in the Indian Premier League

by Pelham Barton

CricketDo the results in the 2009 Indian Premier League Table (20-over cricket) show more variation between the teams than one would expect if the results of single matches were completely random?

The Indian Premier League is a competition in 20-over cricket (the shortest form of the regular professional game) played between eight teams representing various Indian cities. The design of the league was for each team to play each other home and away in a league of 14 matches per team, with the top four teams then taking part in semi-finals and a final. In 2009 the tournament was moved at fairly short notice to South Africa. Teams did not have specific "home" grounds but the full programme of 59 matches was still scheduled.

Weather permitting, each match produces a winner. There is a concept of a tied match in cricket, equivalent to a draw in football, but the nature of the scoring makes this very rare. It happened once in the 2009 tournament, and the competition rules provide for a tie-break, so this match still produced a winner. If a match is partly interrupted by bad weather, rules are in place to ensure that a winner is still found whenever possible. If there is severe disruption from the weather, there is no room in the schedule for the match to be replayed, and the match is left with no result. In 2009 this happened twice in the league matches. In each case no play took place at all.

The winner of each league match is awarded two points, with one point to each team in the case of no result. The total points scored by the teams in 2009 were 20, 17, 16, 14, 14, 13, 11, 7. Much has been written about what went right and wrong for the various teams taking part in the tournament. Not much thought appears to have been given to the question of how much variation in the teams' results would be expected on the basis of chance alone.

I set up a computer application to assign results at random to each of the 56 league matches. For each match I gave a probability of 2 in 56 of a no result (the empirical value from this season) and otherwise gave each team an equal chance of winning. I then got the computer to calculate the maximum number of points obtained by a single team from such a simulated season.

Repeating this process 1,000,000 times (just a few minutes on a laptop) gave the distribution of highest scores shown in Figure 1. Sampling error is negligible compared to the size of the bars. Odd numbers are less likely than even numbers because they would require the “winning” team to have been involved in an abandoned match.

Figure 1. Simulated distribution of highest team total points for a competition the size of the Indian Premier League with random results.

Figure 1. Simulated distribution of highest team total points for a competition the size of the Indian Premier League with random results.

The actual results had Delhi topping the table with 20 points, exactly the most likely (mode) and median value from the random results.

The "maximum team points" is a somewhat crude statistic. It is better to use a test statistic that reflects the full variation between all eight teams in the table. Recognised measures for this are the standard deviation or its square, the variance. Because a fixed total number of points is awarded, a test based on the sum of the squares of the teams' points is equivalent to a test based on standard deviation or variance. The sum of squares can be any even number within the appropriate range. In Figure 2, the distribution is shown in groups of 10, so that the bar marked 1580 represents the range 1576 to 1584 (inclusive). It is not clear what is causing the “dip” at 1640, but this was maintained when the number of replications was increased by a further factor of 10.

Simulated distribution of sum of squares of team total points for a competition the size of the Indian Premier League with random results

Figure 2. Simulated distribution of sum of squares of team total points for a competition the size of the Indian Premier League with random results

The actual sum of the squares of the teams' points was 1676. This figure was equalled or exceeded in 44.5 percent of my simulations.

I conclude that the variation between the teams in the group table is completely within the range that would be expected if the results were determined solely by chance.

About the Author

Pelham Barton

Pelham Barton

Pelham Barton has been interested in cricket for as long as he can remember. With no playing ability, he has pursued an interest in the statistics of the game and joined the Association of Cricket Statisticians in 1977. In his day job, he is currently a Senior Lecturer in Mathematical Modelling in the Health Economics Unit at the University of Birmingham.

Comments

Pelham.Barton's picture

In 2010, the tournament returned to India. The full programme of 56 group matches was played, each match giving a result, one following the tie-break system. The teams' points were 20,16,14,14,14,14,12, and 8. These results need to be compared against simulated distributions similar to those in Figures 1 and 2, but allowing for every match to have an outright winner. The replacement for Figure 1 only has even numbers of points possible. Mumbai's winning total of 20 occurred as a winning total in 37.6 percent of simulations and was equalled or exceeded in 59.7 percent of simulations: again the figure 20 was both the median and mode of the distribution of table topping points score. The sum of the squares of the teams' points was 1648. This was equalled or exceeded in 71.1 percent of simulations. In other words, random results were more likely than not to produce a greater variation in teams points that the variation actually observed. Again, the overall results for the group table are completely compatible with results determined solely by chance.
Anonymous's picture

Clearly the conclusion is correct. However, I find this article potentially misleading because the results could also be consistent with genuine differences in team skill. For example if one team is slightly better than the average and another is slightly worse then you might well end up with a similar distribution of scores to the simulations based on random performance. I think it might be possible to see this phenomenon in fairly competitive football leagues where the shape of the distribution matches that arising from no-skill results but the teams at the top are generally recognised as better than those at the bottom. Andrew Maclaren