
For each joint observation of all variables
(a) Identify which state takes
(b) Update
to
for the distribution corresponding to the parent instantiation in
the observation
Thus, we have a very simple counting solution to the problem of parameterizing
multinomial networks. This solution is certainly the most widely used and is avail-
able in the standard Bayesian network tools.
The assumptions behind this algorithm are:
1. Local parameter independence, per Equation (7.10).
2. Parameter independence across distinct parent instantiations. That is, the pa-
rameter values when the parents take one state do not influence the parameter
values when parents take a different state.
3. Parameter independence across non-local states. That is, the states adopted by
other parts of the network do not influence the parameter values for a node
once its parent instantiation is given.
4. The parameter distributions are within a conjugate family of priors; specifi-
cally they are Dirichlet distributed.
The third assumption is already guaranteed by the Markov property assumed as a
matter of general practice for the Bayesian network as a whole
. The first and second
assumptions are more substantial and, frequently, wrong. When they are wrong, the
implication is that dependencies between parameter values are not being recognized
in the learning process, with the result that the information afforded by such depen-
dencies is neglected. The upshot is that Algorithm 7.1 will still work, but it will
work more slowly than methods which take advantage of parameter dependencies to
re-estimate the values of some parameters given those of others. The algorithm must
painstakingly count up values for each and every cell in each and every conditional
probability table without any reference to other cells. This slowness of Algorithm 7.1
can be troublesome because many parent instantiations, especially when dealing with
large arity (large numbers of joint parent states), may be rare in the data, leaving us
with a weak parameterization of the network. We will examine different methods of
taking advantage of parameter dependence in probability learning in
7.4 below.
The fourth assumption, that the parameter priors are Dirichlet distributed, enables
the application of the simple Algorithm 7.1 to parameterization. Of course, there are
infinities of other possible prior distributions over parameters; but choosing outside
of the Dirichlet family requires a different estimation algorithm. The exponential
family of distributions, which subsumes the Dirichlet family, admits of tractable es-
timation methods [71]. In any case, choosing inaccurate hyperparameters for the
Dirichlet is a more likely source of practical trouble in estimating parameters than
To be sure, the Markov property does not imply parameter independence from the parameters of descen-
dants, so the third assumption has this stronger implication.
© 2004 by Chapman & Hall/CRC Press LLC