202 Constructing Phylogenetic Trees
more than a few taxa, the number of tree topologies that must be considered
is huge. Thus, the parsimony method is really only practically done on a
computer. In fact, with a large number of taxa, the number of possible trees is
so large that often computer programs only check certain ones to choose the
most parsimonious. Good software, operated by knowledgeable users, can
often find what are likely to be the most parsimonious trees, but there is no
guarantee. (This has caused some embarrassment to researchers publishing
trees without understanding the operation of the software they used to produce
those trees.)
We can save some effort in using the parsimony method if we make the
observation that not all sites will affect the number of mutations needed for a
tree. The obvious case is that if all sequences have the same base at a particular
site, then all trees will need 0 mutations for that site. Thus, we can eliminate
that site from our sequences before applying the algorithm. A less obvious
case is when at a site all sequences have the same base (say A), except for
at most one sequence each with the other bases (C, T , and G). In this case,
regardless of the tree topology, if we put an A at every interior vertex, then
we have the minimum possible number of mutations. That means such a site
will not influence what tree we pick as most parsimonious. This leads to:
Definition. An informative site is one at which at least two different bases
occur at least twice each among the sequences being considered.
Before applying the parsimony algorithm, we can eliminate all noninfor-
mative sites from our sequences, because they will not affect the choice of
most parsimonious tree. In the previous examples, you will note only infor-
mative sites have been used.
The Maximum Parsimony method does not use the Jukes-Cantor model of
molecular evolution, nor any other explicit model of DNA mutation. Instead, it
carries an implicit assumption that mutation is rare, and the best explanation
of evolutionary history is the one that requires the least mutation. There
has been a vigorous, and at times acrimonious, debate between researchers
advocating model-based methods of tree reconstruction and those advocating
parsimony. Rather than join a philosophical argument, we simply point out
that when there are few mutations obscuring previous mutations, both distance
and parsimony methods seem to work well in practice. The assumptions of
both can be justifiably criticized, and much work is still being done to find
better methods.