
temporal ordering of nodes (equivalently, reversing the arc directions relating those
nodes) must be at least as dense as the true model. In any case, a little reflection will
reveal that acceptance of Reichenbach’s Principle of the Common Cause implies that
non-causal distributions are strictly derivative from and explained by some causal
model. Of course, it is possible to reject Reichenbach’s principle (as, for example,
Williamson does [299]), but it is not clear that we can reject the principle while still
making sense of scientific method.
As we shall see below, many causal discovery algorithms evade, or presuppose
some solution to, the problem of identifying the correct variable order. In principle,
these algorithms can be made complete by iterating through all possible orderings
to find the sparsest model that the algorithm can identify for each given ordering.
Unfortunately, all possible orderings for
variables number , so this completion
is exponential. Possibly, the problem can be resolved by sampling orderings and
imposing constraints when subsets of nodes have been ordered. So far as we know,
such an approach to causal discovery has yet to be attempted.
6.3.1.2 Markov equivalence summary
In summary, Markov equivalence classes of causal models, or patterns, are related to
each other graphically by Verma and Pearl’s Theorem 6.1: they share skeletons and
v-structures. They are related statistically by having identical maximum likelihoods,
and so, by orthodox statistical criteria, they are not distinguishable. Despite that
limitation, learning the patterns from observational data is an important, and large,
first step in causal learning. We do not yet know, however, how close we can get in
practice towards that goal, since the CI algorithm is itself a highly idealized one. So:
in reality, how good can we get at learning causal patterns?
6.3.2 PC algorithm
Verma and Pearl’s CI algorithm appears to depend upon a number of unrealistic fea-
tures. First, it depends upon knowledge of the actual conditional independencies
between variables. How is such knowledge to be gained? Of course, if one has ac-
cess to the true causal structure through an actual oracle, then the independencies
and dependencies can be read off that structure using the d-separation criterion. But
lacking such an oracle, one must somehow infer conditional independencies from
observational data. The second difficulty with the algorithm is that it depends upon
examining independencies between all pairs of variables given every subset of vari-
ables not containing the pair in question. But the number of such alternative subsets
is exponential in the number of variables in the problem, making any direct imple-
mentation of this algorithm unworkable for large problems.
The causal discovery program TETRAD II copes with the first problem by ap-
plying a statistical significance test for conditional independence. For linear mod-
els, conditional independence
is represented by the zero partial correlation
(also described as a vanishing partial correlation), that is, the corre-
lation remaining between
and when the set S is held constant. The standard
© 2004 by Chapman & Hall/CRC Press LLC