Why it’s important to be pedantic about sigmas and commas
The BBC reported last week that evidence for the Higgs Boson is “around the two-sigma level of certainty” and provides further explanation:
Particle physics has an accepted definition for a "discovery": a five-sigma level of certainty. The number of standard deviations, or sigmas, is a measure of how unlikely it is that an experimental result is simply down to chance rather than a real effect”
This is nice and clear, but it is also wrong, as we have pointed out before in a previous blog by Kevin McConway.
The number of sigmas does not say 'how unlikely the result is due to chance': it measures 'how unlikely the result is, due to chance'.
The additional comma may seem staggeringly pedantic (and indeed statisticians have been accused of being even more pedantic about language than lawyers). So what is the problem?
The first, incorrect, 'how unlikely the result is due to chance' applies the term ‘unlikely’ to the whole phrase ‘the result is due to chance’, ie it says that the hypothesis that the Higgs Boson does not exist is unlikely, or equivalently it is likely the Higgs Boson exists.
The second, correct, 'how unlikely the result is, due to chance' applies the term 'unlikely' to the data, and just says that the data is surprising, if the Higgs Boson does not exist. It does not imply that it is necessarily likely that the Higgs Boson exists.
Take Paul the Octopus, who correctly predicted 8 football results in a row, which is unlikely (probability 1/256), due to chance. Is it reasonable to say that these results are unlikely to be due to chance (in other words that Paul is psychic)? Of course not, and nobody said this at the time, even after this 2.5 sigma event. So why do they say it about the Higgs Boson?
This is important - people have been wrongly condemned for murder because this comma was left out. The comma needs to be in there.