chap-07 4/6/2004 17: 24 page 159
ORDINATION METHODS 159
then we also want to minimize the variance that it does not describe – in other words, we
want to minimize the sum of the squared distances of points away from the line (Figure
7.1C). (Note: the distances that are minimized by PCA are not the distances minimized in
conventional least-squares regression analysis – see Chapter 10.)
The next step is to describe the variation that is not described by PC1. When there are
only two original variables this is a trivial step; all of the variation that is not described by
the first axis of the ellipse is completely described by the second axis. So, let us consider
briefly the case in which there are three observed traits: X
1
, X
2
and X
3
. This situation is
unlikely to arise in optimally superimposed landmark data, but it illustrates a generalization
that can be applied to more realistic situations. As in the previous example, all traits
are normally distributed and no trait is independent of the others. In addition, X
1
has
the largest variance and X
3
has the smallest variance. A three-dimensional model of this
distribution would look like a partially flattened blimp or watermelon (Figure 7.2A). Again
PC1 is the direction in which the sample has the largest variance (the long axis of the
watermelon), but now a single line perpendicular to PC1 is not sufficient to describe the
remaining variance. If we cut the watermelon in half perpendicular to PC1, the cross-
section is another ellipse (Figure 7.2B). The individuals in the section (the seeds in the
watermelon) lie in various directions around the central point, which is where PC1 passes
through the section. Thus, the next step of the PCA is to describe the distribution of data
points around PC1, not just for the central cross-section, but also for the entire length of
the watermelon.
To describe the variation that is not represented by PC1, we need to map, or project,
all of the points onto the central cross-section (Figure 7.2C). Imagine standing the halved
watermelon on the cut end and instantly vaporizing the pulp so that all of the seeds drop
vertically onto a sheet of wax paper, then repeating the process with the other half of
the watermelon and the other side of the paper. The result of this mapping is a two-
dimensional elliptical distribution similar to the first example. This ellipse represents the
variance that is not described by PC1. Thus, the next step of the three-dimensional PCA
is the first step of the two-dimensional PCA – namely, solving for the long axis of a
two-dimensional ellipse, as outlined above. In the three-dimensional case, the long axis
of the two-dimensional ellipse will be PC2. The short axis of this ellipse will be PC3, and
will complete the description of the distribution of seeds in the watermelon. By logical
extension, we can consider N variables measured on some set of individuals to represent
an N-dimensional ellipsoid. The PCs of this data set will be the N axes of the ellipsoid.
After the variation in the original variables has been redescribed in terms of the PCs, we
want to know the positions of the individual specimens relative to these new axes (Figure
7.3). As shown in Figure 7.3A, the values we want are determined by the orthogonal pro-
jections of the specimen onto the PCs. These new distances are called principal component
scores. Because the PCs intersect at the sample mean, the values of the scores represent
the distances of the specimen from the mean in the directions of the PCs. In effect, we are
rotating and translating the ellipse into a more convenient orientation so we can use the
PCs as the basis for a new coordinate system (Figure 7.3B). The PCs are the axes of that
system. All this does is allow us to view the data from a different perspective; the positions
of the data points relative to each other have not changed.
As suggested by Figure 7.4, we could compute an individual’s score on a PC from the
values of the original variables that were observed for that individual and the cosines of