Was anyone right about the pre-election polls?

There has been much wailing and gnashing of blogs since the dismal performance of the pre-election polls last week. These had confidently and consistently predicted a rough tie in vote-share between Labour and Conservative, but when the votes were counted the Conservatives had a 6.5% lead.

Comparison of vote-share BBC Poll of polls on May 6th, and actual results on May 7th.
Party Polls vote share Actual vote share
Conservatives 34% 36.9%
Labour 33% 30.4%
UKIP 13% 12.6%
Liberal Democrats 8% 7.9%

The pre-election estimates of vote-share led May 2015 to predict that Ed Miliband would be the next Prime Minister, and even the Nate-Silver-approved Electionforecast was predicting only a small Conservative lead in seats, which could be easily outweighed by a formal or informal coalition.

Comparison of seats predicted by Electionforecast on May 6th, and actual results on May 7th.
Party Expected number of seats 90% interval Actual seats won
Conservatives 278 259-305 331
Labour 267 240-293 232
Scottish National Party 53 47-57 56
Liberal Democrats 27 21-33 8

In the end the Conservatives won a majority and the outcomes, apart from the Scottish National Party, were way outside the 90% prediction limits.

Can anyone say ‘I told you so’?

Not me. I was convinced by the polls and the electionforecast model, and in a gambling frenzy even put £1 on a Lab-Lib pact at what I thought were generous odds of 8-1. William Hill were offering 9-1 against a Conservative majority, so the betting markets were taken in too, and I usually trust them: back in the Scottish Independence vote, when the London government panicked and offered Scotland all sorts of goodies when a single YouGov poll put the Yes vote ahead, the betting markets sensibly stayed firmly in favour of No.

James Morris, who conducted private polls for Labour, is reported as saying that his surveys had correctly identified the deficit for Labour. And Damian Lyons Lowe, CEO of polling company Survation, has reported a late (but previously unpublished) poll that accurately predicted the results, but ruefully acknowledged that “the results seemed so “out of line” with all the polling conducted by ourselves and our peers – what poll commentators would term an “outlier” – that I “chickened out” of publishing the figures – something I’m sure I’ll always regret.”

A lone public cautionary voice came from Matt Singh of Number Cruncher Politics, who the day before the election wrote on whether there was a 'shy Tory' effect. Based on detailed models of past election predictions and outcomes, he concluded that 2010 was an anomaly and Conservative voting tended to be under-estimated in polls - using surveys of choice of prime minister and views of economic competence he concluded that “unless these relationships have completely fallen to pieces, David Cameron will be back as Prime Minister”, and that his “models all suggest Conservative leads of over 6 points”: the actual lead was 6.6%.

I wish I had read that before placing my pathetic little bet, but would I have been convinced?

What reasons have people been making for the inaccuracy?

Having dutifully reported the polls through the campaign, the BBC's David Cowling rather grumpily says "Something is wrong. A lot of us would like to know what it is". But he doesn't have the answers, and neither does the extraordinarily contrite Ian Jones of ukgeneralelection, who admits he “got it all wrong. I spent the entire year getting it wrong. Why? … Something was missing from my calculations. Something that was not evident anecdotally, or in the polls, or arithmetically, or perhaps even rationally. It was the unexpected. I didn’t expect it. And so I got it wrong”.

Ian says his mistake was to treat the polls “not as a reflection of temporary opinion but of permanent intention”, while the theme that the questions are too short and simple was taken up by BBC's Chris Cook. Meanwhile Tom Clark emphasised the self-selected types who sign up for online polls, although telephone polls showed equally poor performance. But would you answer a telephone poll? I was told on More or Less that there was only a 30% response rate, and this can hardly be an unbiased sample.

The polling companies have been also been garbing themselves in sackcloth and ashes. ICM, who carried out the Guardian's polls, have done a very nice deconstruction of the effects of different stages of their complex 'adjustment' process: the raw data showed a draw, but simple demographic weighting was disastrous and led to a massive Labour lead, while additional adjustments for ‘shyness’ etc merely corrected it back to a draw.

The Electionforecast model has been taken apart in the 5-38 blog, which initially said the Ashcroft poll questions made things worse, but later retracted this. Their rather plaintive conclusion was that their seat prediction would not have been too bad if only the polls had been better - I don't think Nate Silver will be adding this election to his CV.

It may be best to follow Matt Singh's advice and “avoid jumping to conclusions about what happened”, but in the meantime perhaps we could try and learn something from a statistical success story.

Why were the exit polls so good?

The exit polls have essentially been exactly right in the 2005 and 2010 election, and were close this time, with expecting Conservative to win 316 seats, Labour 239, SNP 58, LibDem 10, and others 27. Details of the methods can be found on David Firth's excellent explanatory site.

Exit polls have the great advantage of targeting people who actually vote, and deal with true but unknown 'facts' rather than possibly vague intentions. But they are also of far higher quality than the standard pre-election poll, which is why they are expensive: the main one apparently costs hundreds of thousands of pounds and is jointly commissioned by the BBC, Sky and ITV. There is rigorous design, with carefully selected polling stations and personal interviewing, and there is rigorous analysis, with statistical models of change, and regression analysis using constituency characteristics.

Essentially a 'transition matrix' for each seat is estimated, representing the proportions of voters who change their vote between any two parties. These provide expected vote-shares for each seat, which are transformed to probabilities of each party winning. These are then added up to give an ‘expected number’ of seats – when they said UKIP were predicted to win 2 seats this could have been say 1.67, accumulated over a number of potential seats. [And now my chance for an ill-tempered rant. Unfortunately the BBC thought the viewers were too dim-witted to be told anything about how the exit-poll predictions were arrived at, or their margin of error. Even though there was so much empty time to fill before the results were declared, the point predictions were simply endlessly repeated. I was shouting at the screen in frustration. But I should not be surprised, as the BBC also thinks that we are too stupid to comprehend chances of rain, or sampling error in unemployment figures. I am perhaps being generous in assuming the journalists understand these concepts themselves - they prefer categorical predictions, which we know will be wrong, rather than proper communication of uncertainty. Right, glad I've got that off my chest.]

What should happen in the future?

Grave doubts have been expressed about the whole basis of pre-election polling: indeed ICM wondered “whether or not telephone (but this also very much applies to online) data collection techniques remain capable of delivering a representative sample which can be modelled into an accurate election prediction”.

But there is no shortage of suggestions to patch things up, including -

  • Tom Clark suggests using broader sources of information, such as Google searches and the 'wisdom of crowds', asking "what do you think people will vote for?"
  • James Morris, Labour’s pollster, recommends asking initial ‘priming’ questions in which respondents are asked to think of important issues.
  • Survation says they got accurate results by naming the candidates through a ballot prompt specific to the respondents’ constituency, and speaking only to the named person from the dataset and calling mobile and landline telephone numbers to maximise the “reach” of the poll.
  • Observing that polls are simply a case of consumer behaviour, Retail Week recommends doing “tracker surveys” which, like exit polls, essentially model the transition matrix.
  • Number Cruncher Politics points to three areas for more careful adjustment of raw data: the use of past votes, ‘shy’ voters, and likelihood to vote.
  • The 5-38 blog says ”We need to include even more uncertainty about the national vote on election day relative to the polls.”

My personal view

It is a huge shame to see statistics being brought into disrepute. And these numbers are important. Perhaps the top levels of the political parties themselves were more savvy, but these polls were being given great prominence by the media and certainly influenced popular perceptions, and no doubt voting behaviour.

It seems extraordinary that such high-profile polls are done on the cheap for the media - if this is supposed to be PR for the polling companies, it has hardly been a great success. And it's no use claiming the polls are merely snapshots of intention - it is unavoidable that they will be treated as predictions, and this is reflected in companies doing fancy adjustments to try and deal with don't-knows or refusals.

The problems appears to be systematic bias rather than margin of error, so simply increasing sample sizes would only make things worse. Instead, there are clearly fundamental issues in the design of the data-collection and in the subsequent analysis, and all of this will no doubt be considered by the investigation being carried out by the British Polling Council. My own gut feeling is that it may be better to construct longitudinal tracker surveys based on careful interviewing in order to estimate changes in voting intention, essentially adapting the exit poll methodology to pre-election polls. This will make the polls more expensive, but there are far too many anyway.

If the industry wants to improve its reputation then it needs to take proper control of what goes on, or maybe face Lord Foulkes's proposed Bill in the House of Lords to set up an independent regulator for the polling industry, with powers to ban polls in the run-up to elections, as in France, India, Italy and Spain.

But will this be sorted out before the EU referendum? Or will the polling companies again be providing trough-loads of cheap and cheerful material to fill the media? Of course this may be an easier vote to forecast, but the miserable performance in the 2015 election should not be forgotten.