chap-03 4/6/2004 17: 21 page 57
SIMPLE SIZE AND SHAPE VARIABLES: BOOKSTEIN SHAPE COORDINATES 57
same triangle to different baselines differ mainly by translation, rotation and rescaling.
In effect, all the statistical results are (approximately) the same regardless of the choice of
baseline. However, this does not mean the baseline should be chosen arbitrarily. First, some
landmarks are difficult to digitize and may be especially difficult to locate – these should not
serve as an endpoint of the baseline. This is because the method, in effect, transfers all the
variance in the baseline points to all the other landmarks, so if the endpoints of the baseline
are highly variable then all the points will be noisy. More problematically, the variance
is not evenly distributed across all landmarks; the transfer of variance might therefore
introduce a bias into the data. Another consideration that enters into choosing a baseline
is its orientation. If the baseline rotates relative to a body axis it does not compromise
the statistical analyses, but it can make interpretations based on graphics difficult – it
might seem that all the landmarks are moving away from the baseline in the posterodorsal
direction, for example, when the baseline rotates in the anteroventral direction. Also, in
choosing the endpoints of the baseline, we do not want points that are too close to each
other because any highly localized variation in shape may be common to both those points.
Just as the noise of the baseline landmarks is transferred to all the others, the variance local
to the baseline landmarks is transferred to all the other landmarks. Ideally, therefore, we
want endpoints of the baseline to be along the longest diameter of the form that passes
through the centroid of the form, so long as those points are not especially unreliable and
the longest diameter does not rotate.
It is easiest to interpret results when the baseline lies along an organismal body axis.
Even though results can be interpreted in a baseline-invariant way, the interpretations still
refer to sides of the triangle. It is most convenient when at least one side is a conventional
and familiar reference. Bookstein has put a great deal of emphasis on baseline-invariant
interpretations out of a concern for reports free from arbitrary, abiological decisions.
However, organismal body axes are neither arbitrary nor abiological – indeed, we often
want to make explicit references to organismal body axes in our interpretations. Thus,
even though we can interpret shape changes without reference to organismal body axes,
we might still wish to orient our findings with respect to them. This motivates choosing a
baseline along one of those axes.
Statistics of shape coordinates
Once we have shape coordinates, we can answer the basic “existential” questions as defined
in Chapter 1, such as “do these samples differ in shape?” These questions have “yes” and
“no” answers supplied by statistical tests. All conventional statistical methods and tests
can be applied to shape coordinates and centroid size. For example, an average value
for the shape coordinate at point C is computed by averaging the X-coordinates for that
point across all individuals within a sample, then dividing that sum by the total number
of individuals in that sample; the same procedure is then applied to the Y-coordinates.
Variances and standard deviations are also calculated by standard formulae. Because the
two endpoints of the baseline are fixed, they have no variance and should not be included
in statistical analyses. If you use conventional statistical packages to analyze these coordi-
nates, remember to exclude them from the analysis because many programs will not run
if the variables do not vary.