
Each of these values is the result of some counting process. The point is that, with
this theorem, computing
has become a straightforward counting problem,
and is equal to
times a simple function of the number of assignments to parent
and child variables and the number of matching cases in the sample. Furthermore,
Cooper and Herskovits showed that this computation of
is polynomial,
i.e., computing
given a particular is tractable under the assumptions
so far.
Unfortunately, while the metric may be tractable, we still have to search the space
, which we know is exponentially large. At this point, Cooper and Herskovits
go for a dramatic final simplifying assumption:
6. Assume we know the temporal ordering of the variables.
If we rely on this assumption, the search space is greatly reduced. In fact, for any
pair of variables either they are connected by an arc or they are not. Given prior
knowledge of the ordering, we need no longer worry about arc orientation, as that
is fixed. Hence, the model space is determined by the number of pairs of variables:
two raised to that power being the number of possible skeleton models. That is, the
new hypothesis space has size only
. The K2 algorithm simply
performs a greedy search through this reduced space. This reduced space remains,
of course, exponential.
In any case, so far we have in hand two algorithms for discovering discrete causal
structure: TETRAD (i.e., PC with a
test) and K2.
8.2.1 Learning variable order
Our view is that reliance upon the variable order being provided is a major drawback
to K2, as it is to many other algorithms we will not have time to examine in detail
(e.g., [35, 27, 273, 179])
. Why should we care? It is certainly the case that in
many problems we have either a partial or a total ordering of variables in pre-existing
background knowledge, and it would be foolish not to use all available information to
aid causal discovery. Both TETRAD II and CaMML, for example, allow such prior
information to be used to boost the discovery process (see 8.6.3). But it is one thing to
allow such information to be used and quite another to depend upon that information.
This is a particularly odd restriction in the domain of causal discovery, where it is
plain from Chapter 6 that a fair amount of information about causal ordering can be
learned directly from the data, using the Verma-Pearl CI algorithm.
In principle, what artificial intelligence is after is the development of an agent
which has some hope of overcoming problems on its own, rather than requiring en-
gineers and domain experts to hold its hand constantly. If intelligence is going to
be engineered, this is simply a requirement. One of the serious impediments to the
success of first-generation expert systems in the 1970s and 80s was that they were
brittle: when the domain changed, or the problem focus changed to include anything
We should point out that there are again many others in addition to our CaMML which do not depend
upon a prior variable ordering, such as TETRAD, MDL (
8.3), GES [188].
© 2004 by Chapman & Hall/CRC Press LLC