
The most common approach by those pursuing Bayesian metrics for causal dis-
covery thus far has been to assume that dags within Markov equivalence classes
are inherently indistinguishable, and to apply a uniform prior probability over all
of them; see, for example, Madigan et al. [177]. Again, according to Heckerman
and Geiger [103], equal scores for Markov equivalent causal structures will often be
appropriate, even though the different dag structures can be distinguished under the
causal interpretation.
Note that we are not here referring to a uniform prior over patterns, which we con-
sidered in
8.4, but a uniform prior within patterns. Nonetheless, the considerations
turn out to be analogous. The kind of indistinguishability that has been justified for
causal structures within a single Markov equivalence class is observational indistin-
guishability. The tendency to interpret this as in-principle indistinguishability is not
justified. After all, the distinguishability under the causal interpretation is clear: a
causal intervention on
in Figure 8.2 will influence in the chain but not in the
common causal structure. Even if we are limited to observational data, the differ-
ences between the chain and the common cause structure will become manifest if we
expand the scope of our observations. For example, if we include a new parent of
,
a new v-structure will be introduced only if
is participating in the common causal
structure, resulting in the augmented dags falling into distinct Markov equivalence
classes.
We can develop our reasoning to treat linear extensions. A totally ordered model,
or TOM, is a dag together with one of its linear extensions. It is a plausible view of
causal structures that they are, at bottom, TOMs. The dags represent causal processes
(chains) linking together events which take place, in any given instance, at particular
times, or during particular time intervals. All of the events are ordered by time.
When we adopt a dag, without a total ordering, to represent a causal process, we are
representing our ignorance about the underlying causal story by allowing multiple,
consistent TOMs to be entertained. Our ignorance is not in-principle ignorance: as
our causal understanding grows, new variables will be identified and placed within
it. It is entirely possible that in the end we shall be left with only one possible linear
extension for our original problem. Hence, the more TOMs (linear extensions) that
are compatible with the original dag we consider, the more possible ways there are
for the dag to be realized and, thus, the greater the prior probability of its being true.
In short, not only is it correct MML coding practice to adjust for the number of
linear extensions in estimating a causal model’s code length, it is also the correct
Bayesian interpretation of causal inference.
This subsection might well appear to be an unimportant aside to the reader, espe-
cially in view of this fact mentioned previously in
8.2.1: counting linear extensions
is exponential in practice [32]. In consequence, the MML code presented so far does
not directly translate into a tractable algorithm for scoring causal models. Indeed, its
direct implementation in a greedy search was never applied to problems with more
than ten variables for that reason [293]. However, TOMs figure directly in the sam-
pling solution of the search and MML scoring problem, in section
8.6 below.
© 2004 by Chapman & Hall/CRC Press LLC