Explaining 5-sigma for the Higgs: how well did they do?

Warning, this is for statistical pedants only.

To recap, the results on the Higgs are communicated in terms of the numbers of sigmas, which has been calculated by the teams from what is generally (outside the world of CERN) termed a P-value: the probability of observing such an extreme result, were there not really anything going on. 5-sigmas corresponds to around a 1 in 3,500,000 chance. This tiny probability is applied to the data, but the common misinterpretation is to apply it to the explanation, and to say that there is only 1 in 3,500,000 probability that the results were just a statistical fluke, or some similar phrase. This distinction may seem pedantic, but as covered in numerous articles and blogs (see for example Carlisle Rainey), it is important.

The reports from the CERN teams were very clear: the CMS team said

“CMS observes an excess of events at a mass of approximately 125 GeV with a statistical significance of five standard deviations (5 sigma) above background expectations. The probability of the background alone fluctuating up by this amount or more is about one in three million.”

while the ATLAS group reported

“A statistical combination of these channels and others puts the significance of the signal at 5 sigma, meaning that only one experiment in three million would see an apparent signal this strong in a universe without a Higgs.”

However the CERN Press release does not give any help with the interpretation, and just says

“We observe in our data clear signs of a new particle, at the level of 5 sigma, in the mass region around 126 GeV."

How did everyone else do?

The BBC did very well. Tom Feilden got it dead right on the Today programme, and on the BBC website Paul Rincon said

“They claimed that by combining two data sets, they had attained a confidence level just at the "five-sigma" point - about a one-in-3.5 million chance that the signal they see would appear if there were no Higgs particle.”

In the explanation they say

“The number of sigmas measures how unlikely it is to get a certain experimental result as a matter of chance rather than due to a real effect”

which is ambiguous, but would be improved by a comma after 'result'.

The Numbers Guy (Carl Blalik) in the Wall Street Journal provides a nice explanation of the issue, saying of the '1 in 3.5 million chance'

That is not the probability that the Higgs boson doesn't exist. It is, rather, the inverse: If the particle doesn't exist, one in 3.5 million is the chance an experiment just like the one announced this week would nevertheless come up with a result appearing to confirm it does exist.

although the additional statement is not so good:

In other words, one in 3.5 million is the likelihood of finding a false positive—a fluke produced by random statistical fluctuation

which puts the probability on the explanation ('fluke') rather than the data.

As far as I can see, every other news source gets the interpretation wrong - see also examples in Carlisle Rainey's blog. The New York Times

Both groups said that the likelihood that their signal was a result of a chance fluctuation was less than one chance in 3.5 million, “five sigma,” which is the gold standard in physics for a discovery.

The Daily Telegraph reported

Dr James Gillies, Cern’s communications director, says that talk of a discovery is “premature” and that any event would need to reach the “five sigma” level, an expression of statistical significance used by physicists, meaning it is 99.99997 per cent likely to be genuine rather than a fluke.

which I hope was not a quote from Cern’s communications director.

The Independent was typical

meaning that there is less than a one in a million chance that their results are a statistical fluke.

but I expected better from New Scientist, with their

There's a 5-in-10 million chance that this is a fluke.

Live Science had

The level of significance called sigma found for the new particle in the ATLAS experiment. A 5 sigma means there is only a 1 in 3.5 million chance the signal isn't real.

while Forbes Magazine reported

The chances are less than 1 in a million that it’s not the Higgs boson.

The BBC has shown it is not too tricky to get it right: it is a shame that people don't seem to care.


HMichaelPower's picture

Hi David I am a great fan of your work, but I have to make a comment about an overlooked assumption in your blog post. "The probability of the background alone fluctuating up by this amount or more is about one in three million." This assumes that there was no bias in the measurements. This is should not be brushed aside as a pedantic quibble. The risk of bias may be small in measurements made by CERN, where I am sure they will be checked, and cross-checked, and correlated. But in softer sciences such as medicine, bias (such as unconscious spin and conscious manipulation) is a very real risk, whatever the P-value.
david's picture

A very reasonable comment.

The CERN teams claim that the P-value allows for systematic biases, but of course that assumes they have them right!

Bill Jefferys's picture

Standard alpha-level significance testing assumes that the number of samples is set in advance. This was not the case here. No attempt was made, as far as I know, to control for "data-peeking". http://www.science20.com/quantum_diaries_survivor/blog/keeplooking_bias
Oliver Kuss's picture

Dear David, thank you very much for your informative comments around the statistics of the Higgs. As you explicitly invited the statistical pedants, here is my pedantry: Your definition of the p-value should rather say "the probability of observing such an extreme (or even more extreme) result, were there not really anything going on". Yours, Oliver
Simon Vaughan's picture

A question from a non-particle physicist. Given that the variance in the data is not known a priori but also obtained from the data should we be talking about the number of t's rather than sigmas? (This really is pedantry because there's a lot of data here, and as N increases the t dist tends to the normal dist.)
Perhaps's picture

I think the Cern statement would be much more clear with "frequency" instead of "probability": "The frequency of the background alone fluctuating up by this amount or more is about one in three million." The probability of the background fluctuating up by any amount or more somewhere is one, if the number of observations goes to infinity. In my eyes, the Cern statement should be complemented with the total number of calculated p-values and the probability that at least one of them is smaller or equal to the observed minimum.
Perhaps's picture

I guess "rate" would be better than "frequency", but I hope my point is clear anyway.
Tom's picture

Hi! How does one actually arrive at the 5-sigma (or the chance of being wrong on the level of 1 in 3.5 Million) number, if only 400 Higgs-like events were found so far (out of 6 Trillion p-p collisions)? In other words, from which numbers was the 5-sigma number calculated? -Tom.