130 Modeling Molecular Evolution
4.3. Conditional Probabilities
When base substitutions occur in the evolution of DNA, the probability of a
particular base appearing at a site in the descendent sequence might depend
on the ancestral base. For example, if the ancestral base is a T , we would
expect the probability of a T in the descendent to be high. If the ancestral
base is a C, we would expect a lower probability of the descendent having a
T , since a transition is less likely than no change. If the ancestral base is an
A or G, we might expect an even lower probability that the descendent has a
T , because transversions might be rarer than transitions.
To formalize this, we need the concept of conditional probability. This is
the probability of one event given that we know another event has occurred.
Letting S
0
refer to the ancestor and S
1
the descendent, we’ll use notation like
“S
0
= C” to mean that the ancestral site has base C, and “S
1
= T ” to mean
the descendent site has base T . Then,
P(S
1
= T | S
0
= C) = .02
will mean that there is a 2% chance that the descendent base is a T given
that the ancestral base is a C. Note that the vertical bar “|” in this conditional
probability notation is read as “given that.” We now have a good way to refer
to the fact the probability of a “final” base appearing depends on the “initial”
base that appeared.
Taking into account the previous comments on the likelihood of transi-
tions and transversions, which of P(S
1
= A | S
0
= C), P(S
1
= G | S
0
=
C), P(S
1
= C |S
0
= C), and P(S
1
= T |S
0
= C) are likely to be small-
est? Which is likely to be biggest?
The properties of probabilities discussed earlier carry over to the setting of
conditional probabilities, as long as we keep in mind we are always assuming
something particular happened – the given condition. For instance,
P(S
1
= A | S
0
= C) + P(S
1
= G | S
0
= C)
+ P(S
1
= C | S
0
= C) + P(S
1
= T | S
0
= C) = 1.
After all, given that S
0
= C, the four events S
1
= A, G, C, and T are mutually
exclusive, yet certainly one of them will occur, and so the probabilities must
add to 1.
Example. The conditional probability P(S
1
= T | S
0
= C) is not the same
as the probability P(S
1
= T and S
0
= C). To see this clearly, suppose we