
7.3.3 Incomplete data: summary
In summary, when attribute values are missing in observational data, the optimal
method for learning probabilities is to compute the full conditional probability dis-
tribution over the parameters. This method, however, is exponential in the arity of the
joint missing attribute measurements, and so computationally intractable. There are
two useful approximation techniques, Gibbs sampling and expectation maximiza-
tion, for asymptotically approaching the best estimated parameter values. Both of
these require strong independence assumptions — especially, that the missing val-
ues are independent of the observed values — which limit their applicability. The
alternative of actively modeling the missing data, and using such models to assist
in parameterizing the Bayesian network, is one which commends itself to further
research. In any case, the approximation techniques are a useful start.
7.4 Learning local structure
We now turn to a different kind of potential dependence between parameters: not
between missing and observed values, but between different observed values. Algo-
rithm 7.1, as you will recall, assumed that the different states which a child variable
takes under different parent instantiations are independent of each other, with the
consequence that when there are dependencies, they are ignored, resulting in slower
learning time. When there are dependencies between parameters relating the par-
ents and their child, this is called local structure, in contrast to the broader structure
specified by the arcs in the network.
7.4.1 Causal interaction
One of the major advantages of Bayesian networks over most alternative uncertainty
formalisms (such as PROSPECTOR [78] and Certainty Factors [34]) is that Bayes-
ian networks allow, but do not require, conditional independencies to be modeled.
Where there are dependencies, of any complexity, they can be specified to anyde-
gree required. And there are many situations with local dependencies, namely all
those in which there is at most limited causal interaction between the parent vari-
ables. To take a simple example of interaction: one might ingest alkali, and die; one
might instead ingest acid, and die; but if one ingests both alkali and acid together
(to be sure, only if measured and mixed fairly exactly!) then one may well notdie.
That is an interaction between the two potential causes of death. When two parent
causes fully interact, each possible instantiation of their values produces a proba-
bility distribution over the child’s values which is entirely independent of all their
other distributions. In such a case, the full power, and slowness, of the Spiegelhalter
and Lauritzen method of learning CPTs (Algorithm 7.1) is required.
The most obvious case of local structure is that where the variables are continuous
© 2004 by Chapman & Hall/CRC Press LLC