Description
Note: For question 2, you need to add formula and for question 3, you need to draw a diagramsee reading resources attached. And the assignment.
Unformatted Attachment Preview
CLINICAL APPLICATIONS
MEDICALJOSEPH
10.1177/0272989X05275156
BROPHY,
CLINICAL
CHOOSING
MAR–APR
DECISION
APPLICATIONS
A PLATELET
MAKING/MARCH–APRIL
GLYCOPROTEIN IIBIIIA
2005
INHIBITOR
Medical Decision Making with Incomplete
Evidence—Choosing a Platelet
Glycoprotein IIbIIIa Receptor Inhibitor for
Percutaneous Coronary Interventions
James M. Brophy, MD, PhD, FRCP(c), FACC, Lawrence Joseph, PhD
Background. Medical decision making must often be performed despite incomplete evidence. An example is the
choice of a glycoprotein IIb/IIIa (GP2b3a) inhibitor, a class of
potent antiplatelet medications, as adjunctive therapy during percutaneous coronary interventions (PCIs). GP2b3a inhibitor efficacy in reducing adverse outcomes has been well
documented with multiple placebo-controlled randomized
trials, but there is a paucity of comparative data about their
individual equivalency. Substantial cost differentials are also
present between the drugs. Methods. A systematic review of
the literature was performed to identify all randomized
placebo-controlled trials of GP2b3a inhibitors as adjunctive
therapy for PCI. Three complimentary methods were used to
assist in decision making regarding drug equivalency. First,
the data from the single direct comparative trial are analyzed
from a Bayesian perspective. Next, prior information from
other GP2b3a inhibitor trials in similar but not identical patient populations is incorporated. In the 3rd method, indirect
comparisons of GP2b3a inhibitors are carried out using a hierarchical meta-analytic model of the placebo-controlled trials identified by the systematic review. Results. A total of 12
randomized trials were identified involving 3 agents (abcixi-
T
he prevailing paradigm for clinical decision making revolves around evidence-based medicine,
with randomized clinical trials representing the zenith
of experimental comparative designs. When a sufficient number of trials for the treatment of interest have
been run in the target population, this may be appropriate. But what if there are only a few trials, or even just a
single trial? Or what if the trials do not contain head-tohead comparisons of 2 competing treatments, or are
conducted in populations different from that of current
interest? In general, how should we proceed in the absence of unequivocal evidence from sufficient num222 • MEDICAL DECISION MAKING/MAR–APR 2005
mab, eptifibatide, tirofiban), but only 1 involved a direct comparison of 2 drugs (abciximab v. tirofiban). In contradiction to
the original publication, the authors’ Bayesian analysis both
without (method 1) and with (method 2) the inclusion of
some prior information suggests a reasonable probability of
equivalency. The indirect comparisons from all randomized
placebo-controlled trials (method 3) also failed to provide
support for superiority of any agent over the others. Conclusion. Decision making with incomplete evidence is a difficult
but frequently occurring medical dilemma. The authors propose 3 methods that may elucidate the process and illustrate
them in the context of the choice of GP2b3a inhibitor for adjunctive therapy during PCI. Further data may or may not
eventually lead to a different conclusion, but based on the evidence available to date, the authors’ 3 methods suggest clinical equivalency between GP2b3a inhibitors, in contrast to the
initial conclusions from the single comparative randomized
trial. Key words: percutaneous coronary interventions;
platelet glycoprotein IIbIIIa receptor inhibitor; decision making; Bayesian analysis. (Med Decis Making 2005;25:222–228)
bers of concordant clinical trials in the population of
interest? Decisions must still be made. In this article,
we illustrate several techniques that may be useful to
Received 23 April 2004 from the Division of Cardiology, Department of
Medicine, and Department of Epidemiology and Biostatistics, McGill
University, Montréal, Québec. Revision accepted for publication 27
September 2004.
Address correspondence to James M. Brophy, Division of Cardiology,
Royal Victoria Hospital, 687 Pine Avenue West Room R4.12, Montréal,
Québec H2L 4M1, Canada; james.brophy@mcgill.ca.
DOI: 10.1177/0272989X05275156
CHOOSING A PLATELET GLYCOPROTEIN IIBIIIA INHIBITOR
reach reasonable, although possibly tentative, decisions in the face of less than ideal evidence.
Ischemic heart disease remains the leading cause of
patient mortality and morbidity in Western countries.
Percutaneous coronary interventions (PCIs) have become a widely accepted therapy for the symptoms of
ischemic heart disease, with more than 600,000 angioplasties performed annually in the United States.1 PCI
is a generally safe technique with low rates of mortality
and morbidity, but uncontrolled plaque rupture may
expose underlying plaque debris, stimulating platelet
glycoprotein IIb/IIIa receptor (GP2b3a) activation and
resulting in platelet aggregation and thrombosis. This
may mediate the complications associated with interventional procedures, including death and myocardial
infarction (MI). Three GP2b3a inhibitors (abciximab,
eptifibatide, and tirofiban) are commercially available,
and although approved indications as well as individual properties are slightly different, all will suppress
platelet aggregation by at least 80% at therapeutic levels. The majority of PCI procedures are now done under protection of these medications,2 but there has been
only 1 direct comparative randomized trial.3 A narrow
interpretation of “evidence-based medicine” might
base the choice of agent on these results, but existing
practice guidelines have not made specific recommendations as to the choice of drug, perhaps implicitly recognizing the paucity of data. Nevertheless, clinicians
and health care managers must make decisions as to
whether meaningful health benefits exist between the
agents, thereby possibly justifying differences in acquisition costs.
As an illustrative example of decision making with
incomplete evidence and as an extension of previous
work using indirect comparisons to circumvent a lack
of direct comparative trials,4,5 we present several different methods to assess the equivalency of GP2b3a inhibitors as adjunctive therapy for PCI.
of death and MI. After identification of all pertinent information and summarization using a hierarchical
model, we applied the following 3 techniques to assist
in data interpretation and decision making.
METHODS
In method 2, we further develop our Bayesian approach by including information from past trials in the
form of a prior distribution. In particular, we use indirect evidence from placebo-controlled trials of GP2b3a
inhibitor in acute coronary syndrome patients, a similar, but not identical, patient population to the PCI trials. This furnishes separate prior beliefs for the effectiveness of each of the treatments used in these trials
compared to placebo. Taking the difference of the 2
treatment effects (i.e., tirofiban compared to placebo
and abciximab compared to placebo) provides an indirect estimate of the tirofiban to abciximab difference
that is of main interest. A normal distribution is then fit
Literature Review
Following the “Users’ Guides to the Medical Literature,”6 our 1st step, after framing the question, was to
find all available evidence. We performed a systematic
electronic search for all randomized controlled trials of
GP2b3a inhibitors as adjunctive therapy for PCI. Using
PUBMED and the key words randomized controlled
trial, angioplasty, glycoprotein, and inhibition, we
identified 160 articles published up to 31 March 2004.
Trials had to report the meaningful clinical end points
CLINICAL APPLICATIONS
Method 1—Objective Bayesian Analysis
In method 1, an objective Bayesian analysis, the data
from the only comparative trial are examined from a
Bayesian perspective, but without incorporating any of
the information available outside of this single trial.
Standard statistical analyses of randomized clinical trials, including the original analysis reported for this
trial,3 fail to provide a direct estimate of the probability
of treatment superiority, the probability that a clinically meaningful difference exists, or the probability of
clinical equivalence, for any given clinical cutpoint. A
Bayesian analysis, whether or not formally incorporating prior beliefs based on previous trials or other evidence, permits the calculation of these important and
highly clinically relevant probabilities, allowing for
more lucid decision making. In this 1st analysis, we assumed a noninformative prior so that the final probabilities are determined almost exclusively from the observed data of the comparative trial. We used a simple
beta-binomial model. As the sample size in the trial
was large, a Haldane or beta(0,0) prior distribution was
used for the 2 probabilities of interest, that is, the probability of a clinical event (death or nonfatal MI at 30 d)
with either drug, and a binomial likelihood represented the information contained in the observed comparative data. Probability differences were derived by
forming the difference between the 2 beta posterior
densities, and probabilities of clinical interest were
calculated as areas under the curve of this density.7
Method 2—Bayesian Analysis
Incorporating Prior Information
223
BROPHY, JOSEPH
to this estimate, which forms our prior distribution.
These prior beliefs are then updated by the direct comparative data from the direct comparative (TARGET)
trial3 via normal distribution updating, approximating
the binomial likelihood function with a normal density. Because the sample sizes for all of these trials are
quite large, normal approximations fit very well. With
smaller trials, exact methods can be used.7 Because the
prior information is derived from indirect evidence
and on a population not identical to that in the target
trial, one can argue that the prior evidence should be
discounted. At 1 extreme, one would completely discount all previous studies, leading to an analysis identical to the objective Bayesian approach (method 1) described above. At the other extreme, there would be no
discounting. We performed a sensitivity analysis
whereby the degree of discounting of the prior information from the indirect studies is varied, leading to a
range of posterior distributions from which probabilities could be calculated.
Method 3—Indirect Comparisons via
Hierarchical Bayesian Meta-Analysis
In the 3rd method, we again perform an indirect
comparison in which GP2b3a inhibitors are compared
from the systematic review of the placebo-controlled
trials using a Bayesian hierarchical meta-analytic
model.8 At the 1st level, each subject in each arm of
each trial is assumed to follow a binomial distribution,
with separate probabilities of events for each arm of
each trial. At the 2nd level, logarithms of the odds ratios among each study are assumed to follow a normal
distribution. Separate meta-analytic models are run for
all trials of abciximab versus placebo, and for the trials
of eptifibatide or tirofiban versus placebo, so that the
end result of the meta-analyses is 3 posterior distributions, each summarizing the effect of drug versus placebo in all available trials. Taking the ratio of these posterior distributions then produces a ratio of these odds
ratios. Values larger than 1 indicate superiority of
abciximab compared to eptifibatide or tirofiban.
RESULTS
Available Evidence
The systematic literature search found 11 randomized placebo-controlled trials of GP2b3a inhibitors as
adjunctive therapy to PCI, 8 with abciximab,9–16 2 with
eptifibatide,17,18 and 1 with tirofiban.19 Despite the
high-risk populations studied in these 11 trials, death
224 • MEDICAL DECISION MAKING/MAR–APR 2005
rates have been remarkably low for both those receiving GP2b3a drugs (93 deaths in 10,421 patients, 0.9%)
and placebo (103 deaths in 8124 patients, 1.3%). Comparing death rates in GP2b3a drugs to placebo results in
a wide confidence interval for the odds ratio (OR 0.761,
95% CI [0.546, 1.07]), indicating an inconclusive result. Although not including any head-to-head comparisons of the 3 drugs under consideration here, these
trials are of high quality with little evidence of selection, performance, or attribution biases. Therefore, indirect comparisons can be formed,20 as described in
method 3 above.
The systematic literature search identified only 1
head-to-head double-blinded randomized trial comparing 2 GP2b3a inhibitors in the setting of modern PCI
with coronary stenting.3 This TARGET trial was designed to demonstrate the noninferiority of tirofiban as
compared with abciximab. The end point (composite
of death or nonfatal MI at 30 d) occurred more frequently among the 2398 patients in the tirofiban group
than among the 2411 patients in the abciximab group
(7.2% v. 5.7%; hazard ratio [HR] 1.26; 1-sided 95% CI
1.51) and consequently failed to meet the prespecified
limit for noninferiority of tirofiban. The equivalency
requirement was an upper bound of the 95% CI of the
hazard ratio for the comparison of tirofiban with
abciximab 1. By Bayes’s
law,
P(a) P(b | a) / P(b) = P(a | b) ;
so,
P(a) = P(b) P(b | a)/ P(b | a) = .5 * .7 / .3 =
.35 / .3).
Needless to say, in a system with a lot of
such numbers, making sure they are consistent
can be a problem, and one system (PROSPECTOR)
had to implement special-purpose techniques
to handle such inconsistencies (Duda, Hart,
and Nilsson 1976). Therefore, it is a nice
property of the Bayesian networks that if you
specify the required numbers (the probability
of every node given all possible combinations
of its parents), then (1) the numbers will be
consistent and (2) the network will uniquely
define a distribution. Furthermore, it is not
too hard to see that this claim is true. To see
it, we must first introduce the notion of joint
distribution.
A joint distribution of a set of random variables v1 … vn is defined as P(v1 … vn ) for all
values of v 1 … v n . That is, for the set of
Boolean variables (a,b), we need the probabilities P(a b), P(¬ a b), P(a ¬ b), and P(¬ a ¬ b). A
joint distribution for a set of random variables gives all the information there is about
the distribution. For example, suppose we
had the just-mentioned joint distribution for
(a,b), and we wanted to compute, say, P(a | b):
P(a | b) = P(a b) / P(b) = P(a b) / (P(a b) +
P(¬ a b) .
family-out
light-on
3
1
bowel-problem
dog-out
4
hear-bark
5
2
Figure 5. A Topological Ordering.
In this case, I made it a simple top-down numbering.
Note that for n Boolean variables, the joint
distribution contains 2n values. However, the
sum of all the joint probabilities must be 1
because the probability of all possible outcomes must be 1. Thus, to specify the joint
distribution, one needs to specify 2n -1 numbers, thus the 2n -1 in the last section.
I now show that the joint distribution for a
Bayesian network is uniquely defined by the
product of the individual distributions for
each random variable. That is, for the network in figure 2 and for any combination of
values fo, bp, lo, hb (for example, t, f, f, t, t),
the joint probability is
P(fo bp lo do hb) = P(fo)P(bp)P(lo | fo)P(do | fo
bp)P(hb | do) .
Consider a network N consisting of variables v1 … vn. Now, an easily proven law of
probability is that
P(v1 … vn) = P(v1)P(v2 | v1) … P(vn | v1 … vn-1).
This equation is true for any set of random
variables. We use the equation to factor our
joint distribution into the component parts
specified on the right-hand side of the equation. Exactly how a particular joint distribution
is factored according to this equation depends
on how we order the random variables, that
is, which variable we make v1, v2, and so on.
For the proof, I use what is called a topological
sort on the random variables. This sort is an
ordering of the variables such that every variable comes before all its descendants in the
graph. Let us assume that v1 … vn is such an
ordering. In figure 5, I show one such ordering for figure 1.
Let us consider one of the terms in this
product, P(vj | vj – 1). An illustration of what
nodes v 1 … v j might look like is given in
figure 6. In this graph, I show the nodes
immediately above vj and otherwise ignore
everything except vc, which we are concenWINTER 1991
55
Articles
…the most
important
constraint…is
…that…this
computation
is NP-hard…
a
vc
vj – 2
vj – 1
e
vj
vm
56
AI MAGAZINE
c
b
d
Figure 6. Node vj in a Network.
Figure 7. Nodes in a Singly Connected Network.
I show that when conditioning vj only on its successors,
its value is dependent only on its immediate successors,
vj – 1 and vj – 2.
Because of the singly connected property, any two nodes
connected to node e have only one path between them—
the path that goes through e.
trating on and which connects with vj in two
different ways that we call the left and right
paths, respectively. We can see from figure 6
that none of the conditioning nodes (the nodes
being conditioned on in the conditional
probability) in P(vj | v1 … vj – 1) is below vj (in
particular, v m is not a conditioning node).
This condition holds because of the way in
which we did the numbering.
Next, we want to show that all and only
the parents of vj need be in the conditioning
portion of this term in the factorization. To
see that this is true, suppose vc is not immediately above vj but comes before vj in the numbering. Then any path between vc and vj must
either be blocked by the nodes just above vj
(as is the right path from vc in figure 6) or go
through a node lower than vj (as is the left
path in figure 6). In this latter case, the path
is not d-connecting because it goes through a
converging node vm where neither it nor any
of its descendants is part of the conditioning
nodes (because of the way we numbered).
Thus, no path from vc to vj can be d-connecting, and we can eliminate vc from the conditioning section because by the independence
assumptions in Bayesian networks, vj is independent of vc given the other conditioning
nodes. In this fashion, we can remove all the
nodes from the conditioning case for P(vj | v1
… vj – 1) except those immediately above vj .
In figure 6, this reduction would leave us
with P(vj | vj – 1 vj – 2). We can do this for all
the nodes in the product. Thus, for figure 2,
we get
P(fo bp lo do hb) = P(fo)P(bp)P(lo | fo)P(do | fo
bp)P(hb | do) .
We have shown that the numbers specified
by the Bayesian network formalism in fact
define a single joint distribution, thus
uniqueness. Furthermore, if the numbers for
each local distribution are consistent, then
the global distribution is consistent. (Local
consistency is just a matter of having the
right numbers sum to 1.)
Evaluating Networks
As I already noted, the basic computation on
belief networks is the computation of every
node’s belief (its conditional probability)
given the evidence that has been observed so
far. Probably the most important constraint
on the use of Bayesian networks is the fact
that in general, this computation is NP-hard
(Cooper 1987). Furthermore, the exponential
time limitation can and does show up on
realistic networks that people actually want
solved. Depending on the particulars of the
network, the algorithm used, and the care
taken in the implementation, networks as
small as tens of nodes can take too long, or
networks in the thousands of nodes can be
done in acceptable time.
The first issue is whether one wants an
Articles
Bayesian networks have been extended to handle
decision theory.
exact solution (which is NP-hard) or if one
can make do with an approximate answer
(that is, the answer one gets is not exact but
with high probability is within some small
distance of the correct answer). I start with
algorithms for finding exact solutions.
Exact Solutions
Although evaluating Bayesian networks is, in
general, NP-hard, there is a restricted class of
networks that can efficiently be solved in
time linear in the number of nodes. The class
is that of singly connected networks. A singly
connected network (also called a polytree) is one
in which the underlying undirected graph has
no more than one path between any two
nodes. (The underlying undirected graph is
the graph one gets if one simply ignores the
directions on the edges.) Thus, for example,
the Bayesian network in figure 5 is singly connected, but the network in figure 6 is not.
Note that the direction of the arrows does not
matter. The left path from vc to vj requires one
to go against the direction of the arrow from
vm to vj. Nevertheless, it counts as a path from
vm to vj.
The algorithm for solving singly connected
Bayesian networks is complicated, so I do not
give it here. However, it is not hard to see
why the singly connected case is so much
easier. Suppose we have the case sketchily
illustrated in figure 7 in which we want to
know the probability of e given particular
values for a, b, c, and d. We specify that a and
b are above e in the sense that the last step in
going from them to e takes us along an arrow
pointing down into e. Similarly, we assume c
and d are below e in the same sense. Nothing
in what we say depends on exactly how a and
b are above e or how d and c are below. A
little examination of what follows shows that
we could have any two sets of evidence (possibly empty) being above and below e rather
than the sets {a b} and {c d}. We have just
been particular to save a bit on notation.
What does matter is that there is only one
way to get from any of these nodes to e and
that the only way to get from any of the
nodes a, b, c, d to any of the others (for exam-
ple, from b to d) is through e. This claim follows from the fact that the network is singly
connected. Given the singly connected condition, we show that it is possible to break up
the problem of determining P(e | a b c d) into
two simpler problems involving the network
from e up and the network from e down.
First, from Bayes’s rule,
P(e | a b c d) = P(e) P(a b c d | e) / P(a b c d) .
Taking the second term in the numerator, we
can break it up using conditioning:
P(e | a b c d) = P(e) P(a b | e) P(c d | a b e) /
P(a b c d) .
Next, note that P(c d | a b e) = P(c d | e )
because e separates a and b from c and d (by
the singly connected condition). Substituting
this term for the last term in the numerator
and conditioning the denominator on a, b,
we get
P(e | a b c d) = P(e) P(a b | e) P(c d | e) / P(a b)
P(c d | a b) .
Next, we rearrange the terms to get
P(e | a b c d) = (P(e) P(a b | e) / P(a b)) (P(c d |
e) (P(c d | a b)) .
Apply Bayes’s rule in reverse to the first collection of terms, and we get
P(e | a b c d) = (P(e | a b ) P(c d | e)) (1 / P(c d |
a b)) .
We have now done what we set out to do.
The first term only involves the material from
e up and the second from e down. The last
term involves both, but it need not be calculated. Rather, we solve this equation for all
values of e (just true and false if e is Boolean).
The last term remains the same, so we can
calculate it by making sure that the probabilities for all the values of E sum to 1. Naturally,
to make this sketch into a real algorithm for
finding conditional probabilities for polytree
Bayesian networks, we need to show how to
calculate P(e | a b) and P(c d | e), but the ease
with which we divided the problem into two
distinct parts should serve to indicate that
these calculations can efficiently be done. For
a complete description of the algorithm, see
Pearl (1988) or Neapolitan (1990).
Now, at several points in the previous discussion, we made use of the fact that the network was singly connected, so the same
argument does not work for the general case.
WINTER 1991
57
Articles
a
a
c
b
b c
d
d
Figure 8. A Multiply Connected Network.
Figure 9. A Clustered, Multiply
Connected Network.
There are two paths between node a and node d.
By clustering nodes b and c, we turned the graph of
figure 8 into a singly connected network.
However, exactly what is it that makes multiply connected networks hard? At first glance,
it might seem that any belief network ought
to be easy to evaluate. We get some evidence.
Assume it is the value of a particular node. (If
it is the values of several nodes, we just take
one at a time, reevaluating the network as we
consider each extra fact in turn.) It seems that
we located at every node all the information
we need to decide on its probability. That is,
once we know the probability of its neighbors, we can determine its probability. (In
fact, all we really need is the probability of its
parents.)
These claims are correct but misleading. In
singly connected networks, a change in one
neighbor of e cannot change another neighbor of e except by going through e itself. This
is because of the single-connection condition.
Once we allow multiple connections between
nodes, calculations are not as easy. Consider
figure 8. Suppose we learn that node d has
the value true, and we want to know the conditional probabilities at node c. In this network, the change at d will affect c in more
than one way. Not only does c have to
account for the direct change in d but also
the change in a that will be caused by d
through b. Unlike before, these changes do
not separate cleanly.
To evaluate multiply connected networks
exactly, one has to turn the network into an
equivalent singly connected one. There are a
few ways to perform this task. The most
common ways are variations on a technique
58
AI MAGAZINE
called clustering. In clustering, one combines
nodes until the resulting graph is singly connected. Thus, to turn figure 8 into a singly
connected network, one can combine nodes
b and c. The resulting graph is shown in
figure 9. Note now that the node {b c} has as
its values the cross-product of the values of b
and c singly. There are well-understood techniques for producing the necessary local
probabilities for the clustered network. Then
one evaluates the network using the singly
connected algorithm. The values for the variables from the original network can then be
read off those of the clustered network. (For
example, the values of b and c can easily be
calculated from the values for {b c}.) At the
moment, a variant of this technique proposed by Lauritzen and Spiegelhalter (1988)
and improved by Jensen (1989) is the fastest
exact algorithm for most applications. The
problem, of course, is that the nodes one
creates might have large numbers of values.
A node that was the combination of 10
Boolean-valued nodes would have 1024
values. For dense networks, this explosion
of values and worse can happen. Thus, one
often considers settling for approximations
of the exact value. We turn to this area next.
Approximate Solutions
There are a lot of ways to find approximations of the conditional probabilities in a
Bayesian network. Which way is the best
depends on the exact nature of the network.
Articles
However, many of the algorithms have a lot
in common. Essentially, they randomly posit
values for some of the nodes and then use
them to pick values for the other nodes. One
then keeps statistics on the values that the
nodes take, and these statistics give the
answer. To take a particularly clear case, the
technique called logic sampling (Henrion
1988) guesses the values of the root nodes in
accordance with their prior probabilities.
Thus, if v is a root node, and P(v) = .2, one
randomly chooses a value for this node but in
such a way that it is true about 20 percent of
the time. One then works one’s way down the
network, guessing the value of the next lower
node on the basis of the values of the higher
nodes. Thus, if, say, the nodes a and b, which
are above c, have been assigned true and
false, respectively, and P(c | ¬ b) = .8, then we
pick a random number between 0 and 1, and
if it is less than .8, we assign c to true, otherwise, false. We do this procedure all the way
down and track how often each of our nodes
is assigned to each of its values. Note that, as
described, this procedure does not take evidence nodes into account. This problem can
be fixed, and there are variations that improve
it for such cases (Shacter and Peot 1989; Shwe
and Cooper 1990). There are also different
approximation techniques (see Horvitz, Suermondt, and Cooper [1989]). At the moment,
however, there does not seem to be a single
technique, either approximate or exact, that
works well for all kinds of networks. (It is
interesting that for the exact algorithms, the
feature of the network that determines performance is the topology, but for the approximation algorithms, it is the quantities.) Given
the NP-hard result, it is unlikely that we will
ever get an exact algorithm that works well
for all kinds of Bayesian networks. It might be
possible to find an approximation scheme
that works well for everything, but it might
be that in the end, we will simply have a
library of algorithms, and researchers will
have to choose the one that best suits their
problem.
Finally, I should mention that for those
who have Bayesian networks to evaluate but
do not care to implement the algorithms
themselves, at least two software packages are
around that implement some of the algorithms I mentioned: IDEAL (Srinivas and Breese
1989, 1990) and HUGIN (Andersen 1989).
Applications
As I stated in the introduction, Bayesian networks are now being used in a variety of
applications. As one would expect, the most
?
Figure 10. Map Learning.
Finding the north-south corridor makes it more likely that there is an intersection
north of the robot’s current location.
common is diagnosis problems, particularly,
medical diagnosis. A current example of
the use of Bayesian networks in this area is
PATHFINDER (Heckerman 1990), a program to
diagnose diseases of the lymph node. A
patient suspected of having a lymph node
disease has a lymph node removed and examined by a pathologist. The pathologist examines it under a microscope, and the information
gained thereby, possibly together with other
tests on the node, leads to a diagnosis.
PATHFINDER allows a physician to enter the
information and get the conditional probabilities of the diseases given the evidence so far.
PATHFINDER also uses decision theory. Decision theory is a close cousin of probability
theory in which one also specifies the desirability of various outcomes (their utility) and
the costs of various actions that might be performed to affect the outcomes. The idea is to
find the action (or plan) that maximizes the
expected utility minus costs. Bayesian networks have been extended to handle decision
theory. A Bayesian network that incorporates
decision nodes (nodes indicating actions that
can be performed) and value nodes (nodes
indicating the values of various outcomes) is
WINTER 1991
59
Articles
eat out
straw-drinking
order
milk-shake
“order”
“milk-shake”
drink-straw
animal-straw
“straw”
Figure 11. Bayesian Network for a Simple Story.
Connecting “straw” to the earlier context makes the drink-straw reading more likely.
called an influence diagram, a concept invented by Howard (Howard and Matheson 1981).
In PATHFINDER , decision theory is used to
choose the next test to be performed when the
current tests are not sufficient to make a diagnosis. PATHFINDER has the ability to make treatment decisions as well but is not used for this
purpose because the decisions seem to be sensitive to details of the utilities. (For example,
how much treatment pain would you tolerate
to decrease the risk of death by a certain
percentage?)
PATHFINDER’s model of lymph node diseases
includes 60 diseases and over 130 features
that can be observed to make the diagnosis.
Many of the features have more than 2 possible outcomes (that is, they are not binary
valued). (Nonbinary values are common for
laboratory tests with real-number results. One
could conceivably have the possible values of
the random variable be the real numbers, but
typically to keep the number of values finite,
one breaks the values into significant regions.
I gave an example of this early on with earthquake, where we divided the Richter scale for
earthquake intensities into 5 regions.) Various
versions of the program have been implemented (the current one is PATHFINDER-4), and the
use of Bayesian networks and decision theory
has proven better than (1) MYCIN-style certainty
factors (Shortliffe 1976), (2) Dempster-Shafer
theory of belief (Shafer 1976), and (3) simpler
Bayesian models (ones with less realistic independence assumptions). Indeed, the program
has achieved expert-level performance and
has been implemented commercially.
60
AI MAGAZINE
Bayesian networks are being used in less
obvious applications as well. At Brown University, there are two such applications: map
learning (the work of Ken Basye and Tom
Dean) and story unders