Maths of coincidence

As of the 23rd May 2022 this website is archived and will receive no further updates.

understandinguncertainty.org was produced by the Winton programme for the public understanding of risk based in the Statistical Laboratory in the University of Cambridge. The aim was to help improve the way that uncertainty and risk are discussed in society, and show how probability and statistics can be both useful and entertaining.

Many of the animations were produced using Flash and will no longer work.

In What are the chances? we saw how the chance of a rare event occurring could be calculated for specific problems.

To generalise the specific results shown in What are the chances?, we again need to consider repeated independent events separately from matching problems.

Repeated independent events

Suppose we calculate the chance of an event happening in a specific situation to be $p$, where $p$ is small, say a surprising event for the particular person it happened to.

Then we work out that there were $M$ opportunities for a similar event to occur.

Then the expected number of events is $Mp$, which we shall label as $E$.

The probability of at least one such event occurring is 1 minus the probability that no such events occurred. Since the events are independent, the probability that none occur is given by the product of the probabilities of each not occurring, which is $(1-p)^M$. So, overall,

Probability of at least one such event = $1 - (1-p)^M$.

For large $M$, small $p$, this is approximately $1 - e^{-Mp} = 1 - e^{-E}$, where $e=2.718$ is the base of natural logarithms (available on calculators and speadsheets).

This makes it very easy to work out the chance of at least one 'rare' event occurring, as in the following table.

Table showing how the expected number of events tells you the chance of no event occurring

Chance that no events occur, when the expected number of events is 1,2,3,4,5
Expected number of events Chance no events occur Chance at least one event occurs
$1$ 37% 63%
$2$ 13% 87%
$3$ 5% 95%
$4$ 2% 98%
$5$ 1% 99%

We could also obtain this result directly by noting that for large $M$, small $p$, the total number of events that will be observed can be assumed to have a Poisson distribution with expectation $E$.

This means that the chance of no events occurring at all is $e^{-E}$.

Why does anyone win the lottery?

Suppose there are $aN$ lottery tickets sold, each with a chance $1/N$ of winning.

Then each has a chance $1 - 1/N$ of losing, and the chance that they all lose is
$$\left(1-\frac{1}{N}\right)^{Na} \approx e^{-a},$$
where $e=2.718$ is the base of natural logarithms, and is also the limit of $(1 + 1/x)^x$ as $x$ gets large. For $a = 1,2,3,4,5$ this gives the results in the Table.

How the chance nobody wins depends on number of tickets sold
Number of tickets sold % Chance nobody wins
$N$ 37%
$2N$ 13%
$3N$ 5%
$4N$ 2%
$5N$ 1%

Matching problems

Suppose $N$ people pick a number at random between 1 and $T$. Imagine them in a line picking numbers one at a time (without consulting each other). Let the probability that they all pick different numbers be $p(N,T)$. For this event to occur, the second picks a different number from the first, the third picks a different number from the first 2, and so on, so
$$p(N,T) = \left(1-\frac{1}{T}\right)\left(1-\frac{2}{T}\right)....\left(1-\frac{N-1}{T}\right),$$
which may be easily calculated.

Now provided $N$ is reasonably small compared to $T$, then $\left(1-\frac{k}{T}\right) \approx e^{-k/T}$ for $k =
1,2,...,N-1$, and so the probability they all choose different numbers is
$$ p(N,T) = e^{-\frac{1}{T}[1+2+...+(N-1)]} = e^{-\frac{1}{T}N(N-1)/2}$$
using the fact that the sum of the first $N-1$ integers is $N(N-1)/2$. For reasonably large $N$ this can be approximated by $e^{-\frac{N^2}{2T}}$. So we can complete the Table:

Chance that no matches occur, when choosing $N$ numbers at random between $1$ and $T$
$T$ $N^2/2T$ = expected number of matches $p(N,T)$ % chance that all different
$N^2/2$ $1$ $e^{-1}$ 37%
$N^2/4$ $2$ $e^{-2}$ 13%
$N^2/6$ $3$ $e^{-3}$ 5%
$N^2/8$ $4$ $e^{-4}$ 2%
$N^2/10$ $5$ $e^{-5}$ 1%

So if $N$ people choose numbers at random between 1 and $N^2/4 = (N/2)^2$, there is 13% chance they will all choose different numbers.

Levels: