Heart disease screening: where did the uncertainty go?
The UK news media have reported quite extensively a newly published paper by Wald, Simmonds and Morris on screening for cardiovascular disease (CVD) - that is, disease of the heart and blood vessels (principally heart attacks and strokes). The paper concluded that screening should be based on the patient's age alone, and not on other risk factors for CVD. But I'm not going to examine the conclusion - I'm interested in the way the authors dealt with uncertainty.
Typically in a paper reporting on health screening, one would expect to see numerical statements about uncertainty - "a 95% confidence interval for such-and-such a quantity goes from x to y", or "the standard error of such-and-such an estimate is z." But Wald and his colleagues don't do that, for what turn out to be pretty sensible reasons, that perhaps throw some light on ways of dealing with uncertainty generally.
In the paper, the authors do two main things. They compare an existing method of screening for CVD, based on a series of known risk factors, with the alternative of simply declaring that everyone over a certain age has had a positive screening result. The two screening methods are compared in terms of how good they are at identifying people who will in the future have an episode of CVD, as well as how good they are at excluding people who are not going to have CVD. Then they compare the cost effectiveness of the two screening approaches.
At first glance, a screening approach based on age alone sounds pretty odd. Generally, screening for disease operates by carrying out some sort of test or measurement on individuals, and then using the results to declare each individual as either positive or negative. People who screen positive are more likely to have the disease than are people who screen negative, and generally some further investigation or treatment would be carried out for those who screen positive but not for those who screen negative.
Screening based on age alone works by declaring that everyone over a certain age screens positive - Wald and his colleagues suggest that age might be 55 - while everyone under the chosen age would be counted as screening negative. This perhaps looks odd at first glance because, although heart disease and strokes are pretty common, it's just not the case that everyone over 55 is going to get CVD, and similarly it's not the case that nobody under 55 will get it.
But it all begins to make sense if you think of screening not primarily as a way of detecting the people who are going to get CVD, but as a way of deciding who to investigate further or treat. After all, the action that occurs as the result of someone screening positive for a disease isn't simply that they get a label attached to them saying that they might have a disease - it's that some further investigation or treatment is proposed. In their age-based screening approach, Wald and his colleagues actually propose offering everyone above a certain age some ongoing preventative drug treatment for CVD. The screening approach they use for comparison would not treat everyone, simply on the basis of their age, but instead offer treatment based on risk factors that include things like whether they smoke and what their blood pressure and cholesterol levels are as well as age.
Both screening methods will result in some people being offered treatment though in fact they would never get CVD even if they were not treated. These people would be referred to as "false positives". Both methods would also result in some people not being offered treatment though in fact they do get CVD - "false negatives". (For age-based screening, these would be people who had a heart attack or stroke under the age of 55.)
There's some obvious uncertainty here, and it's implicit in the paper. The authors report that, if screening by age is in use, counting everyone aged 55 or over as positive, the "detection rate" is 86%. That is, of the people who are actually get CVD, 86% are 55 or over. (Various other rates are also given in the paper.) The uncertainty lies in the fact that this does not tell us which specific individuals are going to get heart disease and which not, but that's in the nature of screening.
But hang on - this detection rate of 86% sounds very precise. How do they know it's 86% and not, say, 85% or 92% or even 65%? Might you not expect them to say that the estimate of the rate is 86%, but add some measure of uncertainty, perhaps in the form of a confidence interval or standard error or some such?
The authors do not do this, because of the methodology they used for their investigation. Formal, numerical expressions of uncertainty such as confidence intervals are generally designed to take account only of sampling error, that is, uncertainty that arises because the conclusions are based on data that is only a sample from the population one is considering. (If you chose a different sample, you'd probably get slightly different results, so there is uncertainty that arises directly from the sampling process.)
But Wald and his colleagues didn't do it like that. Their conclusions are based on analysis of a simulated set of individuals that don't actually exist and hence are not, in the literal sense, any kind of sample from any actual population.
Why, then, should we take any notice of the conclusions? How can they be telling us anything about the actual position in the real population of England and Wales (or anywhere else)? Well, that relies on the way they simulated the individuals. The simulated individuals, their properties and risk factors, and whether or not they have a heart attack or a stroke, were generated in a way that matches data from real populations, as measured by various surveys and studies involving many thousand real people. The authors build up a pretty convincing case that their simulated population, in all important respects, is close enough to the real population of England and Wales to give reasonably reliable answers to the questions they ask in the paper.
But uncertainty remains. The studies that were used as a basis for the simulation had results that incorporate some uncertainty, for instance from sampling error. Many of them were carried out in the USA, so there is further uncertainty about the extent to which they apply to a UK population. And, in doing the simulations, the researchers made some minor simplifications to the information they got from the other surveys and studies, so that the simulated people do not exactly match the results of the other studies anyway.
Thus, despite the fact that the paper does not give any kind of numerical measure of the accuracy of figures like the 86% detection rate I mentioned above, there are many kinds of uncertainty involved. So does that mean the results are meaningless?
I'd say they are in fact meaningful, as long as we don't forget about the uncertainty. In the paper, Wald and his colleagues describe in some detail exactly what they did, and why in their view their results are probably close enough to the position in the real population for their conclusions to hold. That is, they give an explanation that is largely not numerical as to why other should believe their conclusion that age-based screening is, in terms of CVD detection, almost as good as the existing screening methods, and in practical terms preferable to existing methods. They do not disguise the fact that there is uncertainty involved, but they are not explicit about measuring this uncertainty, and actually it's pretty difficult to see how it could be measured explicitly. Readers of the paper will have to make their own decision on whether they agree with the authors' arguments.
My reason for blogging about this is not, however, to praise the virtues of the approach taken by Wald and his colleagues. It is because this study draws attention very clearly to aspects of uncertainty that are always present, even in more conventional studies that do give numerical measures of uncertainty. If you want to apply the results of a study done in one population to the people in some other population, you will have to make a judgement about the extent to which the results do carry over to the new population. You might think that they will match pretty well; I might disagree. Without repeating the study in the new population, we can't be certain which of us is right. Very often, it is this sort of uncertainty that is most important in making recommendations on what to do, and not the sort of uncertainty from sampling error, that is relatively easy to measure and take into account.
A particular situation where these non-sampling aspects of uncertainty are very relevant is in making decisions about how to treat patients on the basis of information from clinical trials. A recent article by Ben Goldacre pointed out one aspect of this - a clinical trial report will generally be very explicit about the uncertainty from sampling error, but that will not help you if you have to decide whether to use the treatment being studied in some group of patients that doesn't match very closely those that were included in the clinical trial. (This may be an issue in the cost-effectiveness part of the CVD screening study - the data on effectiveness of the proposed preventative drug treatments is based on results from clinical trials and similar well-controlled studies, and things might be different in a situation where entire populations over a certain age are offered the drug treatment.)
Finally, you might be wondering why Wald and his colleagues used the approach of simulating a population, rather than going and observing real people. Well, think about what that would involve. You'd have to take a pretty large and representative sample of people and follow them up for many years, screening them at regular intervals by the existing method, and recording who develops CVD and when. This would be possible in principle (and indeed Wald and his colleagues suggest in their paper that it should be done in future). But it would take a great deal of time and cost a lot of money. And think about what would happen when it ended. Decisions would still need to be made on applying its conclusions in practice, and it still wouldn't get rid of the issue that the population that was studied will not be exactly the same as the population in which the conclusions might be implemented (because time has passed, if nothing else). That is, there would still be uncertainty that had nothing to do with sampling error. The uncertainty never goes away; one has to accept it and cope with it.