|
Bayesian inference is statistical
inference in which probabilities are interpreted not as frequencies or proportions or the like, but rather as degrees of belief. The
name comes from the frequent use of the Bayes' theorem in this
discipline. Bayes' theorem is named after the Reverend Thomas Bayes.
Evidence and the scientific method
Bayesian statisticians claim that methods of Bayesian inference are a formalisation of the scientific method involving collecting evidence which
points towards or away from a given hypothesis. There can never be certainty,
but as more evidence accumulates, the degree of belief in a hypothesis will usually become very high (almost 1) or very low (near
0). Bayes theorem provides a method for adjusting degrees of belief in the light of new information.
In many cases, the impact of evidence can be summarised in a likelihood
ratio, as expressed in the law of
likelihood. This can be combined with the prior probability
to reflect the original degree of belief and any earlier evidence already taken into account. Before a decision is made, the
loss function also needs to be considered to reflect the consequences of
making an erroneous decision.
There is debate around what it is that informs the original degree of belief. Objective Bayesians seek an objective value for
the degree of probability of a hypothesis being correct, and so do not avoid the philosophical criticisms of objectivism.
Subjective Bayesians hold the that the prior probabilities represent subjective degrees of belief, but that repeated application
of Bayes’ theorem leads to a high degree of agreement on the posterior probability. They therefore fail to provide an
objective standard for choosing between conflicting hypotheses.
Simple examples of Bayesian inference
From which bowl is the cookie?
To illustrate, suppose there are two bowls full of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2
has 20 of each. Our friend Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to
believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How
probable is it that Fred picked it out of bowl #1?
Intuitively, it seems clear that the answer should be more than a half, since there are more plain cookies in bowl #1. The
precise answer is given by Bayes' theorem. Let H1 corresponds to bowl #1, and H2 to bowl
#2. It is given that the bowls are identical from Fred's point of view, thus P(H1) =
P(H2), and the two must add up to 1, so both are equal to 0.5. The "data" D consists in the
observation of a plain cookie. From the contents of the bowls, we know that P(D | H1) = 30/40 = 0.75
and P(D | H2) = 20/40 = 0.5. Bayes' formula then yields
-
Before observing the cookie, the probability that Fred chose bowl #1 is the prior probability,
P(H1), which is 0.5. After observing the cookie, we revise the probability to
P(H1|D), which is 0.6.
False positives in a medical test
False positives are a problem in any kind of test: no test is perfect, and sometimes the test will incorrectly report a positive result. For example, if a
test for a particular disease is performed on a patient, then there is a chance (usually small) that the test will return a positive result even if the patient
does not have the disease. The problem lies, however, not just in the chance of a false positive prior to testing, but
determining the chance that a positive result is in fact a false positive. As we will demonstrate, using Bayes' theorem, if a
condition is rare, then the majority of positive results may be false positives, even if the test for that condition is
(otherwise) reasonably accurate.
Suppose that a test for a particular disease has a very high success rate:
- if a tested patient has the disease, the test accurately reports this, a 'positive', 99% of the time (or, with probability
0.99), and
- if a tested patient does not have the disease, the test accurately reports that, a 'negative', 95% of the time (i.e.
with probability 0.95).
Suppose also, however, that only 0.1% of the population have that disease (i.e. with probability 0.001). We now have
all the information required to use Bayes' theorem to calculate the probability that, given the test was positive, that it is a
false positive.
Let A be the event that the patient has the disease, and B be the event that the test returns a positive
result. Then, using the second alternative form of Bayes' theorem (above), the probability of a
true positive is
-
and hence the probability of a false positive is about (1 − 0.019) = 0.981.
Despite the apparent high accuracy of the test, the incidence of the disease is so low (one in a thousand) that the vast
majority of patients who test positive (98 in a hundred) do not have the disease. (Nonetheless, this is 20 times the proportion
before we knew the outcome of the test! The test is not useless, and re-testing may improve the reliability of the result.) In
particular, a test must be very reliable in reporting a negative result when the patient does not have the disease, if it is to
avoid the problem of false positives. In mathematical terms, this would ensure that the second term in the denominator of the
above calculation is small, relative to the first term. For example, if the test reported a negative result in patients without
the disease with probability 0.999, then using this value in the calculation yields a probability of a false positive of roughly
0.5.
In this example, Bayes' theorem helps show that the accuracy of tests for rare conditions must be very high in order to
produce reliable results from a single test, due to the possibility of false positives. (The probability of a 'false negative'
could also be calculated using Bayes' theorem, to completely characterise the possible errors in the test results.)
In the courtroom
Bayesian inference can be used to coherently assess additional evidence of guilt in a court setting.
- Let G be the event that the defendent is guilty.
- Let E be the event that the defendent's DNA matches DNA found at the crime scene.
- Let p(E | G) be the probability of seeing event E assuming that the defendent is guilty. (Usually this would be taken to be
unity.)
- Let p(G | E) be the probability that the defendent is guilty assuming the DNA match event E
- Let p(G) be the probability that the defendent is guilty, based on the evidence other than the DNA match.
Bayesian inference tells us that if we can assign a probability p(G) to the defendent's guilt before we take the DNA evidence
into account, then we can revise this probability to the conditional probability p(G | E), since
- p(G | E) = p(G) p(E | G) / p(E)
Suppose, on the basis of other evidence, a juror decides that there is a 30% chance that the defendent is guilty. Suppose also
that the forensic evidence is that the probability that a person chosen at random would have DNA that matched that at the crime
scene was 1 in a million, or 10-6.
The event E can occur in two ways. Either the defendent is guilty (with prior probability 0.3) and thus his DNA is present
with probability 1, or he is innocent (with prior probability 0.7) and he is unlucky enough to be one of the 1 in a million
matching people.
Thus the juror could coherently revise his opinion to take into account the DNA evidence as follows:
- p(G | E) = 0.3 × 1.0 /(0.3 × 1.0 + 0.7 × 10-6) = 0.99999766667.
In the United Kingdom, Bayes' theorem was explained by an expert
witness to the jury in the case of Regina versus Denis Adams. The case went to Appeal and the Court of Appeal gave their
opinion that the use of Bayes' theorem was inappropriate for jurors.
More mathematical examples
Naive Bayes classifier
See: naive Bayesian
classification.
Posterior distribution of the binomial parameter
In this example we consider the computation of the posterior distribution for the binomial parameter. This is the same problem
considered by Bayes in Proposition 9 of his essay.
We are given m observed successes and n observed failures in a binomial experiment. The experiment may be
tossing a coin, drawing a ball from an urn, or asking someone their opinion, among many other possibilities. What we know about
the parameter (let's call it a) is stated as the prior distribution, p(a).
For a given value of a, the probability of m successes in m+n trials is
-
Since m and n are fixed, and a is unknown, this is a likelihood function for a. From the
continuous form of the law of total probability we have
-
For some special choices of the prior distribution p(a), the integral can be solved and the posterior takes
a convenient form. In particular, if p(a) is a beta
distribution with parameters m0 and n0, then the posterior is also a beta
distribution with parameters m+m0 and n+n0.
A conjugate prior is a prior distribution, such as the beta distribution in the above example, which has the property
that the posterior is the same type of distribution.
What is "Bayesian" about Proposition 9 is that Bayes presented it as a probability for the parameter p. That is, not
only can one compute probabilities for experimental outcomes, but also for the parameter which governs them, and the same algebra
is used to make inferences of either kind. Interestingly, Bayes actually states his question in a way that might make the idea of
assigning a probability distribution to a parameter palatable to a frequentist. He supposes that a billiard ball is thrown at
random onto a billiard table, and that the probabilities p and q are the probabilities that subsequent billiard
balls will fall above or below the first ball. By making the binomial parameter p depend on a random event, he cleverly
escapes a philosophical quagmire that he most likely was not even aware was an issue.
Computer applications
Bayesian inference has applications in artificial
intelligence and expert systems. Bayesian inference techniques have
been a fundamental part of computerized pattern recognition
techniques since the late 1950s.
There is growing interest in using Bayesian inference to filter spam. For
example: Bogofilter, SpamAssassin and Mozilla.
In some applications fuzzy logic is an alternative to Bayesian inference.
Fuzzy logic and Bayesian inference, however, are mathematically and semantically not compatible: You cannot, in general,
understand the degree of truth in fuzzy logic as probability and vice versa.
See also:
External links
|