bayes-nets.c19
Transcript of bayes-nets.c19
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 1/23
1
CMSC 471
Fall 2002
Class #19 – Monday, November 4
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 2/23
2
Today’s class
• (Probability theory)
• Bayesian inference
– From the joint distribution
– Using independence/factoring
– From sources of evidence
• Bayesian networks
– Network structure
– Conditional probability tables
– Conditional independence
– Inference in Bayesian networks
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 3/23
3
Bayesian Reasoning /
Bayesian Networks
Chapters 14, 15.1-15.2
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 4/23
4
Why probabilities anyway?
• Kolmogorov showed that three simple axioms lead to therules of probability theory
– De Finetti, Cox, and Carnap have also provided compellingarguments for these axioms
1. All probabilities are between 0 and 1:
• 0 <= P(a) <= 1
2. Valid propositions (tautologies) have probability 1, andunsatisfiable propositions have probability 0:
• P(true) = 1 ; P(false) = 0
3. The probability of a disjunction is given by:• P(a b) = P(a) + P(b) – P(a b)
aba b
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 5/23
5
Inference from the joint: Example
alarm ¬alarm
earthquake ¬earthquake earthquake ¬earthquake
burglary .001 .008 .0001 .0009
¬burglary .01 .09 .001 .79
P(Burglary | alarm) = α P(Burglary, alarm)
= α [P(Burglary, alarm, earthquake) + P(Burglary, alarm, ¬earthquake)
= α [ (.001, .01) + (.008, .09) ]
= α [ (.009, .1) ]Since P(burglary | alarm) + P(¬burglary | alarm) = 1, α = 1/(.009+.1) = 9.173
(i.e., P(alarm) = 1/ α = .109 – quizlet: how can you verify this?)
P(burglary | alarm) = .009 * 9.173 = .08255
P(¬burglary | alarm) = .1 * 9.173 = .9173
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 6/23
6
Independence
• When two sets of propositions do not affect each others’probabilities, we call them independent, and can easilycompute their joint and conditional probability:
– Independent (A, B) → P(A B) = P(A) P(B), P(A | B) = P(A)
• For example, {moon-phase, light-level} might beindependent of {burglary, alarm, earthquake} – Then again, it might not: Burglars might be more likely to
burglarize houses when there’s a new moon (and hence little light)
– But if we know the light level, the moon phase doesn’t affectwhether we are burglarized
– Once we’re burglarized, light level doesn’t affect whether the alarmgoes off
• We need a more complex notion of independence, andmethods for reasoning about these kinds of relationships
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 7/237
Conditional independence
• Absolute independence:
– A and B are independent if P(A B) = P(A) P(B); equivalently,P(A) = P(A | B) and P(B) = P(B | A)
• A and B are conditionally independent given C if
– P(A B | C) = P(A | C) P(B | C)
• This lets us decompose the joint distribution:
– P(A B C) = P(A | C) P(B | C) P(C)
• Moon-Phase and Burglary are conditionally independent
given Light-Level• Conditional independence is weaker than absolute
independence, but still useful in decomposing the full jointprobability distribution
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 8/238
Bayes’ rule • Bayes rule is derived from the product rule:
– P(Y | X) = P(X | Y) P(Y) / P(X)
• Often useful for diagnosis:
– If X are (observed) effects and Y are (hidden) causes,
– We may have a model for how causes lead to effects (P(X | Y))
– We may also have prior beliefs (based on experience) about the
frequency of occurrence of effects (P(Y))
– Which allows us to reason abductively from effects to causes (P(Y |
X)).
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 9/239
Bayesian inference
• In the setting of diagnostic/evidential reasoning
– Know prior probability of hypothesis
conditional probability
– Want to compute the posterior probability
• Bayes’ theorem (formula 1):
onsanifestatievidence/m
hypotheses
1 m j
i
E E E
H
)( / )|()()|( ji ji ji E P H E P H P E H P
)(i
H P
)|( i j H E P
)|( i j H E P
)|( ji E H P
)(i
H P
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 10/2310
Simple Bayesian diagnostic reasoning
• Knowledge base:
– Evidence / manifestations: E1, … Em
– Hypotheses / disorders: H1, … Hn
• E j and Hi are binary; hypotheses are mutually exclusive (non-
overlapping) and exhaustive (cover all possible cases)
– Conditional probabilities: P(E j | Hi), i = 1, … n; j = 1, … m
• Cases (evidence for a particular instance): E1, …, El
• Goal: Find the hypothesis Hi with the highest posterior – Maxi P(Hi | E1, …, El)
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 11/2311
Bayesian diagnostic reasoning II
• Bayes’ rule says that
– P(Hi | E1, …, El) = P(E1, …, El | Hi) P(Hi) / P(E1, …, El)
• Assume each piece of evidence Ei
is conditionally
independent of the others, given a hypothesis Hi, then:
– P(E1, …, El | Hi) = l j=1 P(E j | Hi)
• If we only care about relative probabilities for the Hi, then
we have:
– P(Hi | E1, …, El) = α P(Hi) l j=1 P(E j | Hi)
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 12/2312
Limitations of simple Bayesian
inference• Cannot easily handle multi-fault situation, nor cases where
intermediate (hidden) causes exist:
– Disease D causes syndrome S, which causes correlated
manifestations M1 and M2
• Consider a composite hypothesis H1 H2, where H1 and H2
are independent. What is the relative posterior?
– P(H1 H2 | E1, …, El) = α P(E1, …, El | H1 H2) P(H1 H2)
= α P(E1, …, El | H1 H2) P(H1) P(H2)= α l
j=1 P(E j | H1 H2) P(H1) P(H2)
• How do we compute P(E j | H1 H2) ??
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 13/2313
Limitations of simple Bayesian
inference II• Assume H1 and H2 are independent, given E1, …, El?
– P(H1 H2 | E1, …, El) = P(H1 | E1, …, El) P(H2 | E1, …, El)
• This is a very unreasonable assumption
– Earthquake and Burglar are independent, but not given Alarm:• P(burglar | alarm, earthquake) << P(burglar | alarm)
• Another limitation is that simple application of Bayes’ rule doesn’t
allow us to handle causal chaining:
– A: year’s weather; B: cotton production; C: next year’s cotton price
– A influences C indirectly: A→ B → C – P(C | B, A) = P(C | B)
• Need a richer representation to model interacting hypotheses,
conditional independence, and causal chaining
• Next time: conditional independence and Bayesian networks!
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 14/2314
Bayesian Belief Networks (BNs)
• Definition: BN = (DAG, CPD) – DAG: directed acyclic graph (BN’s structure)
• Nodes: random variables (typically binary or discrete, but
methods also exist to handle continuous variables)
• Arcs: indicate probabilistic dependencies between nodes
(lack of link signifies conditional independence) – CPD: conditional probability distribution (BN’s parameters)
• Conditional probabilities at each node, usually stored as a table
(conditional probability table, or CPT)
– Root nodes are a special case – no parents, so just use priors
in CPD:
iiii x x P of nodesparentallof settheiswhere)|(
)()|(so, iiii x P x P
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 15/2315
Example BN
a
b c
d e
P(C|A) = 0.2
P(C|~A) = 0.005
P(B|A) = 0.3
P(B|~A) = 0.001
P(A) = 0.001
P(D|B,C) = 0.1P(D|B,~C) = 0.01
P(D|~B,C) = 0.01P(D|~B,~C) = 0.00001
P(E|C) = 0.4P(E|~C) = 0.002
Note that we only specify P(A) etc., not P(¬A), since they have to add to one
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 16/23
16
Topological semantics
• A node is conditionally independent of its non-
descendants given its parents
• A node is conditionally independent of all other nodes in
the network given its parents, children, and children’s
parents (also known as its Markov blanket)
• The method called d-separation can be applied to decide
whether a set of nodes X is independent of another set Y,
given a third set Z
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 17/23
17
• Independence assumption –
where q is any set of variables
(nodes) other than and its successors
– blocks influence of other nodes on
and its successors (q influences only
through variables in )
– With this assumption, the complete joint probability distribution of all
variables in the network can be represented by (recovered from) local
CPD by chaining these CPD
i x
)|(),...,( 11 ii
n
i n x P x x P
)|(),|( iiii x Pq x P
i xi
i x
i
q
i x
i
Independence and chaining
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 18/23
18
Chaining: Example
Computing the joint probability for all variables is easy:
P(a, b, c, d, e)
= P(e | a, b, c , d) P(a, b, c, d) by Bayes’ theorem
= P(e | c) P(a, b, c, d) by indep. assumption= P(e | c) P(d | a, b, c) P(a, b, c)
= P(e | c) P(d | b, c) P(c | a, b) P(a, b)
= P(e | c) P(d | b, c) P(c | a) P(b | a) P(a)
a
b c
d e
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 19/23
19
Direct inference with BNs
• Now suppose we just want the probability for one variable
• Belief update method
• Original belief (no variables are instantiated): Use priorprobability p(xi)
• If xi is a root, then P(xi) is given directly in the BN (CPT at
Xi)
• Otherwise, – P(xi) = Σ πi P(xi | πi) P(πi)
• In this equation, P(xi | πi) is given in the CPT, but
computing P(πi) is complicated
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 20/23
20
Computing πi: Example
• P (d) = P(d | b, c) P(b, c)
• P(b, c) = P(a, b, c) + P(¬a, b, c) (marginalizing)
= P(b | a, c) p (a, c) + p(b | ¬a, c) p(¬a, c) (product rule)
= P(b | a) P(c | a) P(a) + P(b | ¬a) P(c | ¬a) P(¬a)
• If some variables are instantiated, can “plug that in” andreduce amount of marginalization
• Still have to marginalize over all values of uninstantiated
parents – not computationally feasible with large networks
a
b c
d e
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 21/23
21
Representational extensions
• Compactly representing CPTs
– Noisy-OR
– Noisy-MAX
• Adding continuous variables
– Discretization
– Use density functions (usually mixtures of Gaussians) to build
hybrid Bayesian networks (with discrete and continuous variables)
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 22/23
22
Inference tasks
• Simple queries: Computer posterior marginal P(Xi | E=e)
– E.g., P(NoGas | Gauge=empty, Lights=on, Starts=false)
• Conjunctive queries:
– P(Xi, X j | E=e) = P(Xi | e=e) P(X j | Xi, E=e)• Optimal decisions: Decision networks include utility
information; probabilistic inference is required to find
P(outcome | action, evidence)
• Value of information: Which evidence should we seek next?• Sensitivity analysis: Which probability values are most
critical?
• Explanation: Why do I need a new starter motor?
8/3/2019 bayes-nets.c19
http://slidepdf.com/reader/full/bayes-netsc19 23/23
23
Approaches to inference
• Exact inference
– Enumeration
– Variable elimination
– Clustering / join tree algorithms• Approximate inference
– Stochastic simulation / sampling methods
– Markov chain Monte Carlo methods
– Genetic algorithms
– Neural networks
– Simulated annealing
– Mean field theory