Court of Appeal bans Bayesian probability (and Sherlock Holmes)

As of the 23rd May 2022 this website is archived and will receive no further updates. was produced by the Winton programme for the public understanding of risk based in the Statistical Laboratory in the University of Cambridge. The aim was to help improve the way that uncertainty and risk are discussed in society, and show how probability and statistics can be both useful and entertaining.

Many of the animations were produced using Flash and will no longer work.

..when you have eliminated the impossible, whatever remains, however improbable, must be the truth
(Sherlock Holmes in The Sign of the Four, ch. 6, 1890)

In a recent judgement the English Court of Appeal has not only rejected the Sherlock Holmes doctrine shown above, but also denied that probability can be used as an expression of uncertainty for events that have either happened or not.

The case was a civil dispute about the cause of a fire, and concerned an appeal against a decision in the High Court by Judge Edwards-Stuart. Edwards-Stuart had essentially concluded that the fire had been started by a discarded cigarette, even though this seemed an unlikely event in itself, because the other two explanations were even more implausible. The Court of Appeal rejected this approach although still supported the overall judgement and disallowed the appeal - commentaries on this case have appeared here and here.

But it's the quotations from the judgement that are so interesting:

Sometimes the "balance of probability" standard is expressed mathematically as "50 + % probability", but this can carry with it a danger of pseudo-mathematics, as the argument in this case demonstrated. When judging whether a case for believing that an event was caused in a particular way is stronger that the case for not so believing, the process is not scientific (although it may obviously include evaluation of scientific evidence) and to express the probability of some event having happened in percentage terms is illusory.

The idea that you can assign probabilities to events that have already occurred, but where we are ignorant of the result, forms the basis for the Bayesian view of probability. Put very broadly, the 'classical' view of probability is in terms of genuine unpredictability about future events, popularly known as 'chance' or 'aleatory uncertainty'. The Bayesian interpretation allows probability also to be used to express our uncertainty due to our ignorance, known as 'epistemic uncertainty', and popularly expressed as betting odds. Of course there are all gradations, from pure chance (think radioactive decay) to processes assumed to be pure chance (lottery draws), to future events whose odds depend on a mixture of genuine unpredictability and ignorance of the facts (whether Oscar Pistorius will be convicted of murder), to pure epistemic uncertainty (whether Oscar Pistorius knowingly shot his girlfriend).

The judges went on to say:

The chances of something happening in the future may be expressed in terms of percentage. Epidemiological evidence may enable doctors to say that on average smokers increase their risk of lung cancer by X%. But you cannot properly say that there is a 25 per cent chance that something has happened: Hotson v East Berkshire Health Authority [1987] AC 750. Either it has or it has not.

So according to this judgement, it would apparently not be reasonable in a court to talk about the probability of Kate and William's baby being a girl, since that is already decided as true or false (but see note added below). This seems extraordinary.

Part of the problem may be the judges' use of the word 'chance' to describe epistemic uncertainty about whether something has happened or not - this would be unusual usage now (even though Thomas Bayes used 'chance' in this sense). If they had used the term 'probability' perhaps their quote above would seem more clearly unreasonable.

Anyway, I teach the Bayesian approach to post-graduate students attending my 'Applied Bayesian Statistics' course at Cambridge, and so I must now tell them that the entire philosophy behind their course has been declared illegal in the Court of Appeal. I hope they don't mind.

(Note added 1st March 2013: William Hill are currently offering 1000-1 against Chardonnay as the name of the potential future monarch).


Under common law (aka Anglo-American jurisprudence) Rulings of Law must follow precedent. Additionally novel rulings establish precedent. In other words, to be fair to Alice, we must use the same criteria for judgement that we use for Bob. Furthermore, it is established law that someone is "innocent until proven guilty". Using these two axioms, we can see that while refusing to allow bayesian statistics in this case may seem unenlightened, it is nevertheless legally essential that courts do so, as holding someone legally responsible for an action that they did not commit is reprehensible under our legal system, whereas letting someone off the hook for something that they did is merely unfortunate. In our system of law, it is desirable that investigators utilize all available tools, including math, to discover likely culprits and that they use that information to refine and direct their search for actual evidence. It is also desirable that courts refuse to allow the tools to be used as evidence and require that substantial proof be provided.

There's no logical difference between probabilistic evidence and "substantial proof". It's all just a matter of more or less certainty. Of course courts shouldn't hold anyone legally responsible based on anything but a very high degree of probability, a k a solid proof. It's no wonder if courts lack the necessary understanding of probability as "epistemic uncertainty", given that educated statisticians don't have it. Bayesian thinking isn't a mandatory part of the curriculum in any university as far as I know - or has this changed lately?)

One issue here seems to be the 'closed hypothesis space' that sometimes gets flung at Bayesians, particularly with regard to model checking. Here it seems to come up in point 20 of the orginal report where everything except the cigarette and the arcing theories are ruled out: "The judge rejected the possibilities that the fire was caused by an intruder or by a cigarette discarded by someone other than Mr Nulty. There is no appeal against those conclusions." Not only is there no appeal - that's a legal matter - but the absence of any other possibilities is then built into the problem statement. One might wonder how the (seemingly rather large number of) other possibilities were so speedily assigned prior probability zero, and what might happen if they were not. Indeed, if that conclusion was also itself a 'balance of probabilities' argument then there would seem to be plenty of unacknowledged inferential room to move posterior weight around the hypothesis space - perhaps some of it away from the defence's fag packet.

I agree. I have never really liked the Sherlock Holmes doctrine as it assumes you have thought of everything. Which is why it is so surprising that this is not the basis that the Court of Appeal use to reject the doctrine: indeed they agree that there are only 3 explanations!

So the Court of Appeal bans Bayesian probability. Personally, I would allow it but bring back forced labour for those Bayesians who cannot substantiate their priors. Of course this might mean that I'm advocating forced labour for Bayesians tout court, but then I'm not entirely convinced that's a bad thing. More seriously, the conclusion (appeal denied, the ciggie explanation is way more likely than the cable arch) sounds compliant with Bayes theorem to me. I think the "past event" comment is really a warning not to confuse epistemic and real world probabilities. The past event either did, or did not happen, and the court has to, at least, have a stab at figuring out if it did. Not to do so risks treating any event with an epistemic probability greater than 50% as "proven on the balance of probabilities" with no further enquiry. This would make for short court cases and reduce the earnings of lawyers (which might be even better than forced labour for Bayesians). But it woudn't seem to serve justice much; I'd hate to be suing an insurance company, walk into court and find the judge actually listening to an argument such as: - In most disputed claims with insurance companies the insurance company turns out to be right, so, on the balance of probabilities, the insurance company is right Of course, the whole thing is written in "lawyer", so we both might be horribly mistranslating it into "normal".

"... forced labour for those Bayesians who cannot substantiate their priors." And an unspecified chance of forced labour for all non-Bayesians, i.e. those who mistakenly believe they have no priors. Ignorantia juris non excusat!

This makes me very angry. Its like the Indiana Pi bill again, only worse. These idiot judges have demonstrated that they haven't the tiniest idea what knowledge actually is, which should scare the bejezus out of everybody. They are not competent to perform their duties, and they should be removed. This is not some trivial choice between one or another statistical philosphy, this is judges rejecting the rational evaluation of evidence.

Could be misquoted to suggest that the court have just banned something that doesn't exist, like saying Unicorns are not allowed to serve on the jury. Sorry, that doesn't help really does it. More seriously, This ruling bans random effects models used with league tables (and to be fair, we don't teach a lot of people about metapopulations and what not)

It seems more bizarre in light of the fact that DNA evidence is entirely based on Baysian probabilites. I do pity the poor forensic scientists who have to explain this evidence to a jury.

Of the three reasons given for rejecting Holmes' dictum, the first is a point of law and I won't argue with that. The second is right, I think. Conan Doyle could construct situations with only a handful of possibilities but in the real world there are many possibilities, most of them with very low probability of occurrence. In such situations Holmes' dictum is impossible to implement. The third objection seems to me to be a confusion. One starts with an event of a certain probability and then revises it as one considers further evidence. An event that starts off with a low probability can end up as more likely than not, thus meeting the legal need for 'the balance of probability'.

Dear Sir:

"In a recent judgement the English Court of Appeal has not only rejected the Sherlock Holmes doctrine shown above, but also denied that probability can be used as an expression of uncertainty for events that have either happened or not.


So according to this judgement, it would apparently not be reasonable in a court to talk about the probability of Kate and William's baby being a girl, since that is already decided as true or false. This seems extraordinary."

It would be, indeed, were that what the EWCA had stated or implied. But the court did not.

Their Lordships were referring to past events; not future events. What Longmore LJ meant (by the passage you later quoted (from [35] of the reasons) is that in the area of the law involved, once a court decides that a PAST event probably did or did not occur, the occurrence (or non-occurrence) is treated as a certainty.

The point is made, near the end, in the case comment at the second link you provided.

Yours truly,

I am sorry, I must disagree with this: whether the child is a boy or a girl is a past event that has already been decided - the uncertainty is completely epistemic

What is interesting about this is that the judge has used almost the same words we have heard from several lawyers. One of the quotes (we gave this in Chapter 1 of our book "Risk Assessment and Decision Analysis with Bayesian Networks" from an eminent lawyer was: “Look the guy either did it or he didn’t do it. If he did then he is 100% guilty and if he didn’t then he is 0% guilty; so giving the chances of guilt as a probability somewhere in between makes no sense and has no place in the law”. Of course, as we show in the book (Chapter 1 is freely available for download) you can actually prove that the judge's assertion is flawed in the sense that it inevitably leads to irrational decision-making. The key point is that there can be as much uncertainty about an event that has yet to happen (e.g. whether or not your friend Naomi will roll a 6 on a die) as one that has happened (e.g. whether or not Naomi did roll a six on the die). If you did not actually see the die rolled in the second case your uncertainty about the outcome is no different than before it was rolled, even though Naomi knows for certain whether or not it was a six (so for her the probability really is either 1 or 0). As you discover information about the event that has happened (for example, if another reliable friend tells you that an even number was rolled) then your uncertainty changes (in this case from 1/6 to 1/3). And that is exactly what is supposed to happen in a court of law where, typically, nobody (other than the defendant) knows for certain whether the defendant committed the crime; in this case it is up to the jury to revise their belief in the probability of guilt as they see evidence during the trial. David points out that the judge is not just 'banning' Bayesian reasoning, but also banning the Sherlock Holmes approach to evidence. But it is even worse, because the judge is essentially banning the entire legal rationale for presenting evidence (namely to determine the probability that the defendant committed the crime). For more on this see:

Well put, I totally agree :)

To follow up Blaise Egon's point about the trial judge's third objection to "Sherlock Holmes' dictum". This was phrased as follows: "If a judge concludes, on a whole series of cogent grounds, that the occurrence of an event is extremely improbable, a finding by him that it is nevertheless more likely to have occurred than not, does not accord with common sense." Which only goes to show how unreliable common sense is as a guide to understanding probability.
Once we appreciate that there is no such thing as "the" probability of an event, but rather a shifting value, relevant to the evidence on which its assessment is based, this apparent paradox is instantly resolved. In this case, two explanations that were both *a priori* unlikely, *before* we learned the fact that a fire occurred, accrue conditional (posterior) probabilities, given that important evidence, that become appreciable. The logic is related to that which *should have been* applied in the Sally Clark case (see where, looking forwards, both stories - that her two sons would die of natural causes, or alternatively by murder - were initially extremely unlikely, but, *given that the boys had in fact died*, both stories could be taken seriously. Incidentally, that case also shows that we must be very cautious over applying Sherlock's dictum in a probabilistic setting: a hypothesis that has extremely low *prior* probability should not necessarily be considered impossible. What matters is how its probability compares with those of alternative explanations.

Thank you for this piece, David. What a shameful piece of legislation. Any idea how it can be changed?

Dear Sir: "I am sorry, I must disagree with this: whether the child is a boy or a girl is a past event that has already been decided - the uncertainty is completely epistemic." I am treating this as a response to my comment. With respect, I do not see how it is. Your statement is a true statement; however, I do not see it as a response to anything I wrote. Nor do I see it as responsive to anything in the reasons that you criticize. Let's assume for argument that the world outside the courtroom knows the answer sex of the child. That doesn't mean the tribunal knows what the answer is - if the sex of the child is an issue the court has to decide - unless the evidence required to answer the question is adduced in accordance with the rules of the tribunal. If it isn't, the party with the onus has failed to establish the sex of the child. That's it. Nothing more. Courts make decisions on evidence that is by the rules of the system less than complete. The rules of evidence may (and often do) prevent the judge (or jury) from considering evidence that by a Bayesian (or any other analysis) could be seen as relevant to the decision as to whethe X did or did not occur. Again, all the EWCA meant by the text that you have misunderstood is that once a decision is made about the occurrence or non-occurrence of a past event - even if that decision was based on a 50+ some infinitesimal more % basis - the conclusion is treated as a certainty. The certainty level is no different than if the court were satisfied as to the correctness of on an Ivory Snow basis, or even the basis that Wolfgang Pauli would have used to describe the conclusion opposite to the one the court made. In any event, perhaps you should discuss your understanding of what happened in this particular case with members of the law faculty who teach torts and evidence theory. One of them (should he or she teach torts) might even recognize my name. (Pronunciation may vary). Cheers,

I am no statistician, so I could well be wrong. But I think the problem that the judges are pointing at is a logical one: they were offered three alternatives, two were discounted and therefore it was suggested they must accept the third. Quite possibly their explanation of baysian statistics is faulty, I'm not trying to defend the words they used. But the logic seems sound - two options have been disproved, another seems unlikely. But they're not forced to accept that, because there could be some other explanation which hasn't been thought of. I don't think this is an attack on bayesian statistics (and please don't stop teaching it, we need all the statisticians we can get!) unless this situation has occurred before and there is a body of evidence which can be analysed and is reliable enough to make predictions about the future. Holmes is bunk, as was cleverly shown by Chesterton's Father Brown. You are never going to know what you don't know, and hence how can you ever know you've exhausted all possible eventualities? I don't think that is quite the same as using previous observations when you have a lot of the same thing happening. But as I said, I'm no statistician, I could be wide of the mark.

A judge banned Bayes' Theorem a couple of years ago: I see David was recruited to oppose this sort of thing. It is not clear from the story where it was the good reverend or the process of estimation with imperfect data that was forbidden.

I am no lawyer, but my reading of the judgement is that not only should there be 'a balance of probability', but that there should be a preponderance of 'weight of evidence' in order to convict. I would find this reassuring, and far preferable to taking the description too literally. The test might be better expressed as 'on an adequate balance of evidence'. Hard to express without getting mathematical!

If I had to argue that the controversial dicta in Nulty v Milton Keynes Borough Council [2013] EWCA Civ 15 at para 37 were correct I would look to see where they came from. The reference is to Hotson v East Berkshire Health Authority [1987] AC 750, and in that case Lord Mackay said: “ ... if a claimant alleges that he sustained a certain fracture in a fall at work and there is evidence that he had indeed fallen at work, but that shortly before he had fallen at home and sustained the fracture, the court would have to determine where the truth lay. If the claimant denied the previous fall, there would be evidence, both for and against the allegation, that he had so fallen. The issue would be resolved on the balance of probabilities. If the court held on that balance that the fracture was sustained at home, there could be no question of saying that since all that had been established was that it was more probable than not that the injury was not work-related, there was a possibility that it was work-related and that this possibility or chance was a proper subject of compensation.” That is, if in Nulty there was evidence that the fire was caused by a stranger or by a rat, there would be no issue of Mr Nulty’s liability. In the case although those possibilities were raised, there was no evidence to support them, so they could be dismissed. But if there had been evidence to support either alternative, the court would have to determine whether it was true. If, on the balance of probabilities the fire had been caused by a stranger, then in this civil case, it had, as a conclusive fact, been caused in that way and the court would not have to move on to consider Mr Nulty’s liability. As a question of “was it or was it not caused by a stranger,” the answer would have been in the affirmative – absolutely and not as a matter of probability. This is what the court must have been thinking of when it said in Nulty at [37], “...But you cannot properly say that there is a 25 per cent chance that something has happened: Hotson v East Berkshire Health Authority [1987] AC 750. Either it has or it has not.” This was said in the context of dealing with the suggested approach at [36] of listing the alternatives and assigning probabilities to each. But alternatives relevant to the live issue (was it Mr Nulty’s cigarette butt or an electrical fault that started the fire) are different to alternatives that avoid the need to consider issue at all. So although the court introduced confusion by failing to specify what sort of issue it was referring to, the dictum is not as alarming for Bayesians as might appear on first reading. This apology for the court is supported by the rest of the judgment. The reasoning is conventional in legal terms, and it is consistent with a Bayesian approach. This is one of several cases where judges seem to be disavowing a mathematical approach while unconsciously obeying mathematical principles.

I should add, lest I be accused of sycophantic toadyism, that I think the court’s application of Hotson in para 37 of Nulty is nothing more than a truism. It comes down to saying, when a case is decided, it is decided. This is because in dealing with the issue of alternative cause (for example stranger or rat) the court would have to consider all the relevant evidence, including evidence that the fire was caused by the butt or the electrical fault. The whole case is decided in one trial. The decision process is, or should be, equivalent to multiplying the likelihood ratios for each item of evidence (that is, the probability of getting the evidence on the hypothesis that Mr Nulty caused the fire divided by the probability of getting the evidence on the hypothesis that something else caused the fire), and the priors (the probability that this sort of fire was caused by a cigarette butt, divided by the probability that it was caused by something else), to get the ratio of the probability that Mr Nulty caused the fire to the probability that something else did. Although data for the probabilities would not necessarily be available the court could do its best to make sensible estimates. But statistics should be put before the court wherever possible. So although the court did not reject it, if it had deliberately applied a Bayesian approach it would have avoided its fatuous comment in para 37.

Could somebody explain a "testable" formulation of the concept known as "beyond reasonable doubt" that these lawyer types spake of often without using Baysean Statistics?

For a review of the way judges treat "beyond reasonable doubt" see R v Wanhalla, available at . This is a New Zealand Court of Appeal case but it reviews the law in several other countries. The judgment of Glazebrook J is particularly interesting. I have commented on this case at: .

I have found these comments fascinating, but I am not a statistician. May I give a legal perspective? The Court was not considering Bayesian methodology: it was deciding whether to upset the trial judge because the reasoning was so flawed as to be wrong in law. The trial judge saw 3 possible causes of the fire, all inherently improbable if taken on their own. Of those 3, the discarded cigarette was the most probable. But that did not decide the case, because there was a burden of proof on the party who made that argument. Burden of proof says who in a trial shoulders an evidential burden. The standard of proof is the balance of probabilities. Thus, the legal question at trial was: does showing the cigarette to be the most probable of the (unikely) causes satisfy the burden of proof? The challenge to the trial Judge was that he had not followed The Popi M of 1985 where the eminent trial judge accepted the inherently improbable submarine hypothesis. He was overturned on this. If we pause, it seems fair to note the evidential differences between the two cases. There was no proof of any submarine being present, it was a hypothesis only. Yet there was an electrician who smoked, albeit his written evidence while alive was that he had smoked elsewhere. This is what the Court was dealing with at paras 37-40. In the para (37) which seems to offend the statisticians, it was said:" In deciding a question of past fact the court will, of course, give the answer which it believes is more likely to be (more probably) the right answer than the wrong answer, but it arrives at its conclusion by considering on an overall assessment of the evidence (i.e. on a preponderance of the evidence) whether the case for believing that the suggested event happened is more compelling than the case for not reaching that belief (which is not necessarily the same as believing positively that it did not happen)." And, in order to distinguish Popi M, it was said in para 40: "In that case the combined effect of the gaps in the court's knowledge and the cogency of the factors telling against the theory of a collision with a submarine was that the court could not properly be persuaded that the case for believing the submarine theory was stronger than the case for remaining agnostic." Thus were the 2 cases reconciled. I would agree with Don Mathias's comments in defence of the Court's reasoning. I can't disagree with Joe's comment. AP Dawid also seems to be making similar points. His last sentence is apt: "a hypothesis that has extremely low *prior* probability should not necessarily be considered impossible. What matters is how its probability compares with those of alternative explanations." Whether the comparison of probabilities can be expressed mathematically is another question. I am sure it can be, in theory. But would it not require an intimate knowledge of all the evidence at trial, the 'data'? Courts traditionally don't operate in this way, or at least they proceed with statistical caution. The evaluation of evidence and probabilities is intuitive and logical. And do not forget the burden of proof. The cases establish that if no proper (word beloved of Courts) conclusion can be drawn, then the burden has not been met. It is the last refuge of the puzzled judge, but in some cases it gives the only sensible and pragmatic answer.

Probability relies on pure reason and, in law, pure reason is supposed to trump everything else. Hence one can appeal a judgement on the basis that "no reasonable" judge" would have made it. The relationship between evidence and "the balance of probabilities" is obviously meaningless without comprehension of probability. However, I think that references to Bayesian probability, which suggests university level maths, complicates matters because the law is the profession most open to the type of people who gave up maths with relief at 16 to study history or English, and is therefore dominated by such people. In most legal cases simple probability (elementary Bayesian probability, but not called Bayesian probability) as taught to 15 year-olds in England (although in most cases unsuccessfully) is all that most lawyers involved need to understand: NOT, AND, OR and the meaning of independence. They need to be encouraged to revisit their O level maths text books instead of dismissing probability as too difficult, esoteric, boring and gauchely nerdy, and relying instead on "common sense". I notice that having dismissed probability-based decision-making as quasi-mathematical, LJ Toulson accepted in paragraph 60 Mr Bailey's expert opinion about the extreme improbability of "a whole series of improbable events" apparently without recognising that this relied on the probability multiplication rule. The worst mistake made by judges is the false presumption that the supporting propositions in a civil case (i.e the propositions necessary to the ultimate proposition in the case) can be elevated on the balance of probability to findings of certain fact, bypassing the multiplication rule, thus bypassing the problem that a whole series of merely PROBABLE events (or even just two) can be improbable, and thus routinely making the wrong decisions. The epistemic question discussed above is also caused by too much "common sense". It is resolved simply by distinguishing between facts and propositions. Facts (even about the future) are not probable. They simply were, are or will be. We use words to make simple propositions about facts. In our varying levels of ignorance of the facts, we agree or disagree about our propositions' varying levels of probability. It is the judge's job to make multiple propositions, to estimate their probabilities, and to assemble them according to the simple rules of probability, and to reach his decision on the ultimate proposition. "Weighing" the evidence is not a helpful metaphor when dealing with either the criminal of civil standards of proof.

While the appeals court ruling was less than well-worded, its essence of focus is legal, not statistical. That is, from a legal perspective the one who carries the burden of proof in a civil case must show by a preponderance (or greater weight) of the evidence that the act complained of was the proximate cause of the injury. It is NOT a beauty contest between a closed rank of three hypotheses (regardless of the fact that the parties stipulated that there were only three possibilities). No, the legal test is “did Council prove by a preponderance of the evidence that Nulty’s negligently discarded cigarette caused the fire?” The “three hypotheses proposition” is a distraction at best, and a decidedly dastardly way to pigeon-hole the fact finder at worst. It seems to me that High Court Justice Edwards-Stuart was tricked (either by fundamental and negligent error, or perhaps by design) into believing that his only three options were: (1) cigarette butt, (2) arson, or (3) arcing. Once he started down that path he merely had to choose the most likely of the three. However, that displaces the rule of law which holds that a specific causal allegation must be proved by a preponderance. Ergo, his decision matrix should more properly have been: "Has Council shown by a preponderance that Nulty’s cigarette butt caused the fire?" By the Justice’s own admission, it did not. He only arrived at this fact because he believed that his decision must come from one of the three possible choices given to him by the parties. In fact, as the Court of Appeal pointed out, the Justice had the very real option to simply conclude that the moving party had failed to carry its burden. That would have been the correct reconciliation of both the law and the evidence.