# When is a 'cluster' a real cluster?

6 cyclists were killed in London in a 2 week period between 5th and 13th November 2013. Should we be surprised that this happens at some point in recent history?

In a paper in Significance, Aberdein and Spiegelhalter show that it is reasonable to assume that, over an 8 year period between 2005 and 2012, cycle deaths in London occurred as as a 'Poisson process', and that the expected number of deaths in any 2-week period is 108/208 = 0.57, which we shall denote by $m$. This means that the probability of exactly $x$ deaths in a single 2-week period has a Poisson form
$$f_x = \frac{e^{-m} m^x}{x!},$$
and cumulative distribution function
$$F_x = \sum_{i=0}^{i=x} \frac{e^{-m} m^i}{i!}.$$

Setting $m=108/208=0.57$, this means that the probability of getting at least 6 deaths in a particular 2-week period = $1- F_5 = 0.00003$, or 1-in-35,000. If we consider the whole 8 years, there are 208 disjoint 2-week periods, and so the chance of seeing at least 6 deaths in at least one of these periods = 1 - (the chance of seeing 5 or fewer in all 208 periods) = $1 - F_5^{208} = 0.006$, or 1-in-168.

However we need to know the chance of getting 6 deaths in any moving window of 2-weeks, and this requires the use of so-called 'scan statistics'. The exact distribution theory is extremely tricky, but fortunately Naus (1982) has provided some useful accurate approximations.

Let $X$ be the maximum count in any moving window, where the number of occurrences within each window has a Poisson($m$) distribution, and the overall length of time of interest comprises $L$ disjoint windows - in our case $m = 0.6$ and $L=208$. Let $P_n(L) = P(X$ less than $n |L)$ be the probability of $X$ being less than $n$ given $L$. Naus shows the following exact identities (assuming $F(i) = 0$ if $i$ less than $0$).