Why it’s important to be pedantic about sigmas and commas

The BBC reported last week that evidence for the Higgs Boson is “around the two-sigma level of certainty” and provides further explanation:

Particle physics has an accepted definition for a "discovery": a five-sigma level of certainty. The number of standard deviations, or sigmas, is a measure of how unlikely it is that an experimental result is simply down to chance rather than a real effect”

This is nice and clear, but it is also wrong, as we have pointed out before in a previous blog by Kevin McConway.

The number of sigmas does not say 'how unlikely the result is due to chance': it measures 'how unlikely the result is, due to chance'.

The additional comma may seem staggeringly pedantic (and indeed statisticians have been accused of being even more pedantic about language than lawyers). So what is the problem?

The first, incorrect, 'how unlikely the result is due to chance' applies the term ‘unlikely’ to the whole phrase ‘the result is due to chance’, ie it says that the hypothesis that the Higgs Boson does not exist is unlikely, or equivalently it is likely the Higgs Boson exists.

The second, correct, 'how unlikely the result is, due to chance' applies the term 'unlikely' to the data, and just says that the data is surprising, if the Higgs Boson does not exist. It does not imply that it is necessarily likely that the Higgs Boson exists.

Take Paul the Octopus, who correctly predicted 8 football results in a row, which is unlikely (probability 1/256), due to chance. Is it reasonable to say that these results are unlikely to be due to chance (in other words that Paul is psychic)? Of course not, and nobody said this at the time, even after this 2.5 sigma event. So why do they say it about the Higgs Boson?

This is important - people have been wrongly condemned for murder because this comma was left out. The comma needs to be in there.

Comments

Chuckk's picture

"...How unlikely it is that an experimental result is simply down to chance rather than a real effect." If it is unlikely to be "simply down to chance," then it is likely to be "a real effect." Which seems to match this:
https://news.slac.stanford.edu/features/word-week-five-sigma
"...Researchers plot the probability that their interesting lump or bump is due to chance alone."
"If that point is more than five sigma....from the center of the bell curve, the probability of it being random is smaller than one in one million."
That makes it sound like they are in fact referring to the probability that the results came about through chance.
In the case of Paul, if he had correctly predicted 100 games in a row, any scientist would indeed say that this was very unlikely to be due to chance- but that some other factor made the results themselves very likely. A good scientist would probably stop short of saying the octopus was psychic, though.

david's picture

sorry, SLAC have it wrong too. It's the common misunderstanding of what a P-value is.

PaulB's picture

I can't make sense of your suggested phrase with the comma in. I can't see anything incorrect in saying "The number of standard deviations...is a measure of how unlikely it is that an experimental result could occur by chance", where the 'experimental result' simply means the data observed. I think "down to chance" is an informal way of saying the same thing. The main problem with the BBC's phrasing is the added "...rather than a real effect". Because that puts as in a position where we are asked to choose between two causes for what happened, so we need to compare the a priori probabilities.
Dornfeld's picture

I think the distinction lies in the fact that without the comma, it is suggesting that an experimental result is unlikely to be the result of chance, which is not the correct way to view it. Rather it should be viewed as, given randomness, the result is unexpected, which is what the comma implies. Sigma values of an experimental outcome should be viewed as a quantification of how unexpected the result is.
Glenn's picture

I am an AP Statistics teacher in the US, and I am trying to hunt down the source of that quote about the requirements for particle physics and 5 sigmas. It think it is important to discuss this with my class (when we reach that point) that the p value does not mean anything magical or special; it is just a statement about randomness. The quote is everywhere around the web, but I can't find where it originates. Do you have any guess or idea where I could find it?
Chris t's picture

Maybe Paul the Octopus got lucky on the first times, then worked which teams had the most chance of winning from then on, as he stuck to germany that they would do well after they beat australia 4-0, but lost to Serbia 1-0. Very clever on how the comma changes the whole sentence.