The Lyceum: Holmes' Theorem

It's been nearly a year since my last post. I've been busy. However, lately I have been working through John Kruschke's book Doing Bayesian Data Analysis, and came upon a reference to Sherlock Holmes. The reference was to the quote, "How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?" (from The Sign of Four). Kruschke restated it in Bayesian terms as, “How often have I said to you that when p(D|θi) = 0 for all i , j, then, no matter how small the prior p(θ j) > 0 is, the posterior p(θ j|D) must equal one." (pg. 56). Holmes said many things relevant to Bayes' theorem, so this got me thinking that it would be fun to re-tool various quotes from Sherlock Holmes stories as a dialogue where Sherlock explains Bayes' theorem to Watson. See if you can identify the stories from which I appropriated some of the dialogue!

____________________________________

"How do you do it Holmes?"

"I have my methods, Watson, how many times must I tell you this?"

"Yes yes, I know, what astonishes me is the ease and precision with which you tackle every case. Surely there is some general method you use that others could benefit from knowing."

"My methods are simply good logic. If you so desire, I will explain my methods. My methods are not truly my own, but are already available to any student of deductions."

"Get on with it, then," exclaimed Watson, impatiently waiting to understand how Sherlock Holmes has built a reputation as the cleverest detective in the world.

"When there has been a crime committed we must reason backwards from what we already know. In the every-day affairs of life it is more useful to reason forwards, and so the other comes to be neglected. There are fifty who can reason synthetically for one who can reason analytically."

"What do you mean reasoning synthetically or analytically?"

"Let me see if I can make it clearer. Most people, if you describe a train of events to them, will tell you what the result would be. They can put those events together in their minds, and argue from them that something will come to pass. There are few people, however, who, if you told them a result, would be able to evolve from their own inner consciousness what the steps were which led up to that result. This power is what I mean when I talk of reasoning backwards, or analytically."

"Fascinating."

"Yes, Watson, but this is something well known to logicians and mathematicians. This is a special case of conditional probability called inverse probability. As I said, normally we reason forward about what will happen, but sometimes we need to invert our thinking to understand what has happened."

"How do you do this?"

"Do you recall the adventure of the cardboard box?"

"Of course, that was in 1889, or perhaps 1890."

"We approached the case, you remember, with an absolutely blank mind, which is always an advantage. We had formed no theories. We were simply there to observe and to draw inferences from our observations."

"Yes yes, what of it?"

"We had no reason to prefer any particular theory about what had happened. It is a capital mistake to form conclusions before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts."

"Quite naturally, this seems intuitively obvious to not draw conclusions without having any data."

"Yes, as I always say, data, data, data! One cannot make bricks without clay! However, sometimes what we know prior to obtaining data can help nudge us in the right direction. Remember that we are reasoning backwards, that is, using inverse probability. We want to know the probability of our theory given the evidence. This is called the posterior probability. The information we use to nudge is in the right direction, that is, our hypothesis about what happened, is called the prior probability. What we want to do is look at the evidence and use that to evaluate our prior probability. After all, when my clients come to me they lay all the evidence before me, and I am generally able, by the help of my knowledge of the history of crime, to set them straight. There is a strong family resemblance about misdeeds, and if you have all the details of a thousand at your finger ends, it is odd if you can't unravel the thousand and first. "

"Sounds sensible enough, but how do you know when to use prior information and what do you do if you lack any prior information?"

"This is where it becomes important to not make hasty inferences or let your own fanciful ideas influence your consideration of the evidence. It is important to use what Laplace called the principle of indifference. That is, you must be indifferent to one outcome or the other an have no theories about what happened prior to seeing the evidence. However, it is often quite clear when to use prior information. Consider your days in Afghanistan as a medical doctor. A patient comes to you displaying symptoms of abdominal distress. What do you do?"

"I inspect them for signs of various illnesses, naturally."

"Yes, but many illnesses induce stomach pains and vomiting. How do you know which illnesses to look for?"

"Well, I know that cholera is quite common. Somewhere around ten percent of soldiers contract the illness. It tends to spread quickly when soldiers are forced to live in close quarters."

"Excellent. This is your prior belief, your hypothesis, which you wish to test. You have evidence of it gathered from their symptoms, now you wish to know how likely it is that they have cholera given their symptoms. How often do patients with cholera display the symptoms?"

"Nearly all of them with the infection, 99.9 percent."

"This is important to consider, Watson. This is the likelihood of your evidence, that is the probability that they will display these symptoms given that they have cholera."

"Yes of course, but many other illnesses display these symptoms too."

"Astute observation. This is very important. One also needs to know how likely it is that the patient will display these symptoms given that they do not have cholera."

"I suppose about 25 percent of other illnesses would give similar symptoms."

"Excellent, now all you need to do is consider all of this information to reach your conclusion."

"Yes, Holmes, I very well know how to make a medical diagnosis, and have learned much about crime solving, but how do you do it so well?"

"Don't you see, Watson?! The procedure is the same! It all follows the logic of of Bayes' theorem*."

"What is Bayes' theorem?"

"I said that my methods are available to any student of logic or mathematics. There was a Presbyterian minister by the name of Reverend Thomas Bayes who had left a paper behind after his death detailing a precise formula on how to reason about the causes of events. Just as well the French mathematician Pierre Simon de Laplace had independently discovered the theorem, which he called the theorem of the probability of causes. This theorem is known to mathematicians studying probabilities, and I merely appropriate it for the solving of crimes."

"What is the theorem, then?"

"It's elegantly simple. To calculate the posterior probability, one simply multiplies your prior belief in your hypothesis by the likelihood of the evidence, and divide by the total number of ways one can obtain the evidence. This in turn is found by adding to your numerator the likelihood of your evidence given other hypotheses times your prior belief in your hypothesis not being true, which is simply your prior belief in your hypothesis subtracted from 100 percent."

"That's astonishingly straightforward."

"Then use the information you gave on cholera to calculate the probability of a patient having it given the symptoms displayed."

"Well, my prior belief in the patient having cholera was ten percent, and the likelihood of them displaying those symptoms given that my hypothesis is true was 99.9 percent However, the symptoms are also typical of about 25 percent of other illnesses, in which I am 90 percent confident. So, I just multiply my prior belief in the hypothesis and the likelihood of the symptoms, giving me 9.99 percent. Then I do the same for the likelihood of the symptoms for other illnesses and my confidence in those other illnesses, meaning 25 percent and 90 percent give me 22.5 percent, and then I add the numerator to this to obtain the denominator, giving me 32.49 percent. 9.99 percent divided by this gives me about a 30 percent probability of cholera."

"See? The key is knowing what information to use. It is of the highest importance in the art of detection to be able to recognize, out of a number of facts, which are incidental and which vital. Otherwise your energy and attention must be dissipated instead of being concentrated. This is not as easy as it sounds. Like all other arts, the science of Bayesian inference is one which can only be acquired by long and patient study nor is life long enough to allow any mortal to attain the highest possible perfection in it."

"You never cease to astonish me Holmes. Still, if mastering this method of reasoning requires life long study, how can the typical person reasoning by this method be certain?"

"You can't be certain, as life is long enough to allow any mortal to attain the highest possible perfection in it. Always remember, any truth is better than indefinite doubt."

_______________________________________________________________________

*Bayes' theorem is a theorem in probability that states P(θ|D) = P(D|θ)*P(θ) \ P(D) , where P(D) is equal to P(D|θ)*P(θ) + P(D|~θ)*P(~θ). This is just a formula to tell us how to update our prior beliefs in the face of new evidence. Let's break down the mathematical symbols to see what they mean. This is the P(θ|D), or posterior probability, that we want to know. The P(D|θ) is the probability of obtaining the data given that your hypothesis is true. This is called the likelihood. The P(θ) is your prior probability and last but not least, the P(D) is the total probability of the data occurring. This is referred to as the evidence. So, to further simplify Bayes theorem: Posterior probability = likelihood * prior probability \ evidence.

To work out a problem using Bayes theorem you need to identify a few things. First, you need to identify any prior information to ground your initial beliefs in. More will follow on what to do if this is not possible later. So the first step is to identify P(θ). Secondly, collect your data, and based on the prior knowledge, figure how likely it is that the data would occur. The second step, then, is to identify P(D|θ). The third step will be to calculate the evidence to identify P(D) . Recall that P(D) = P(D|θ)*P(θ) + P(D|~θ)*P(~θ). This may seem scary, but most of the work is already done. At this point, your prior and likelihood are already plugged into the numerator of Bayes' theorem, and given that the first half of the evidence formula is the same as the numerator, just plug in the result to that half. As for the second half, these are easy to figure out. P(~θ) is simply 1 - P(θ). This means you only have to identify P(D|~θ). The P(D|~θ) is just how likely your data are given OTHER hypotheses which are not θ.

An interesting application of this can be found here.

Bits of dialogue were appropriated from A Scandal in Bohemia, A Study in Scarlet, The Sign of Four, The Yellow Face, The Reigate Puzzle, and The Adventure of the Cardboard Box. These stories of course are the original propery of Arthur Conan Doyle, without whom the world would lack the most interesting fictional character of all time and without whom this post would not be possible.

The Lyceum

Pages

Friday, April 11, 2014

Holmes' Theorem

No comments:

Post a Comment