Is it fair that a single bad outcome should label a surgeon as an outlier?

Surgeons are increasingly subject to statistical monitoring, and named results may be made publicly available. But consider a surgeon in a low-risk specialty who has had a successful and blameless career, until a combination of circumstances, possibly beyond their control, contribute to a single patient dying. They then find they are officially labeled as an ‘outlier’ and subject to formal investigation, all because of a purely statistical criterion. Is this fair?

The 'outlier' procedure works as follows. Adverse outcomes (such as deaths) are accumulated over a fixed period, say 1 or 3 years, and compared to that expected given the number of operations and (in some specialties) adjusting for the severity of their patients’ condition. If the probability of getting such a high number of deaths is small, assuming they were truly performing as expected, they are labeled an ‘outlier’ and a pre-specified series of investigative steps is launched. A ‘small’ probability might be 2.5%, 1%, or 0.1% - often 2.5% and 0.1% are considered criteria for ‘alerts’ and ‘alarm’ respectively (technically, these are essentially one-sided $p$-values). This process is subject to guidance from the Department of Health, and there are many tricky issues involved, such as risk-adjustment, over-dispersion, allowance for multiple-comparisons and so on.

These statistical methods were developed in areas such as cardiac surgery, with an average mortality risk of more than 2%, so that every cardiac surgeon expects to have some deaths within each monitoring period. But when applied to low-risk surgical areas such as bariatric or endocrine surgery, with a mortality rate of less than 1 in 1000, they could mean that a single death entails a surgeon being considered an outlier. This may appear harsh.

But the analysis in the Appendix suggests the statistical process may not be so unfair after all. Essentially, for a surgeon to be considered an outlier on the basis of a single death, then their expected number of deaths over the monitoring period has to be less than 1 in 40 (or whatever criterion is being used to define an outlier). What does this mean over a longer term? Since most monitoring periods cover more than a year, an expected number of deaths of less than 1 in 40 is equivalent to the expected number of deaths over an entire surgical career (less than 40 years) being far less than 1. So we get an even simpler criterion –

for a surgeon to be considered a statistical outlier on the basis of a single death, then they are working in a specialty where the majority of surgeons would not expect to see a death in their whole career.

So a mechanical statistical criterion seems to come up with something rather reasonable: surgeons with an unusual-in-a-career bad outcome are subject to appropriate investigation.

The crucial element, though, is ‘appropriate investigation’. The Consultant Outcomes Publications programme means that named-surgeon outcome data is increasingly coming into the public domain on the MyNHS site - for cardiac surgery this includes mortality data, but for lower-risk specialties such as urological surgery complication rates provide better discrimination. It seems questionable that mortality should be part of the consultant-level outcomes in any specialty where a single event can trigger an 'alert' and potentially lead a surgeon to be publicly 'named and shamed'.

Finally, from a completely personal perspective, I admit to some ambivalence about the Consultant Outcomes Publication programme. Accountability and transparency are admirable objectives. But when I eventually have to go under-the-knife, I don't particularly want to look up my surgeon on some website, like using Trip Advisor before booking a hotel. I want to be able to assume that official bodies are doing the checking for me, and conducting proper monitoring and investigations. And appropriate statistical methods are an integral part of that process.

Appendix

To see when a single death triggers an ‘alert’, let’s assume the criterion being used is 2.5% (1 in 40), so it means there is a less than 1 in 40 chance of getting at least one death, given ‘average’ performance. Suppose the average probability of a patient dying is $p$, where $p$ is very small (ie very low-risk surgery), and the surgeon has done n operations during the period being monitored. Then the chance they have no deaths at all is the same as the chance that all their patients survive, which is $(1-p)^n$. Therefore the chance of getting at least one death is $1 - (1-p)^n$ , which for small $p$ is closely approximated by $np$, the expected number of deaths for that surgeon. So for one death to trigger the ‘alert’ criterion, it means that $np \lt $ 2.5%: i.e. the expected number of deaths is less than the $p$-value defining the 'alert' level.

Statement of interest

I have been a paid consultant to Dendrite Clinical Systems Ltd, who manage clinical audits. I have also been an unpaid statistical advisor to HQIP, who oversee the national audit programme, and NICOR, who conduct cardiac audits.