chap-04 4/6/2004 17: 22 page 97
THEORY OF SHAPE 97
Not only will distances from the reference be distorted, so too will the distances among
target specimens, and this distortion will also be a function of their distances from the ref-
erence. If these distortions are large, inferences based on distances in the Euclidean tangent
space will be unreliable.
One possible reference is the average shape of the entire sample (computed using meth-
ods discussed in Chapter 5). This approach has the advantage that it minimizes the average
distance from the reference, which minimizes the average distortions of interspecimen dis-
tances projected to the tangent plane (Bookstein, 1996; Rohlf, 1998). An alternative choice
of reference is a shape inferred to represent the starting point of some biological process
(e.g. a neonate in a study of ontogenetic transformation – cf. Zelditch et al., 1992). This
approach has the advantage that the difference between target and reference can be inter-
preted as a biological transformation as well as a mathematical transformation (Fink and
Zelditch, 1995; Zelditch et al., 1998). However, as Rohlf (1998) points out, this approach
can have the limitation that the reference is at one extreme of the observed distribution
of shapes, thereby increasing the risk of substantial distortions of distances when changes
in shape are large. Conceivably, erroneous inferences could be drawn from the analysis.
However, Marcus et al. (2000) analyzed differences in skull shape among representa-
tives of several mammalian orders and found that most Procrustes distances are closely
approximated by the Euclidean distance in the tangent space. The principal exceptions
were the distances from terrestrial taxa (especially the muskrat) to a dolphin (which is
not surprising, given the extraordinary reorganization of the cetacean head). This result
suggests that most biologists are unlikely to encounter any cases in which the differences
among specimens are large enough to worry about the adequacy of the linear approxima-
tions. It is unlikely that distances in the tangent space (based on any reference) will poorly
approximate distances in shape space. Even so, using the average shape of all specimens in
the data minimizes the risk that such a problem will occur. The use of any other reference
carries with it the responsibility to ensure that Euclidean distances in the tangent space are
accurate approximations of the distances in shape space.
Dimensions and degrees of freedom
The issue of degrees of freedom (or the number of independent measurements in a system)
is important for statistical analyses, but it can be confusing, especially when talking about
shape. To clarify it, we can consider a simple example. Suppose we wish to describe the
location of a notebook in a room. We could give its location in terms of three distances
from a reference point (such as the corner of the door of the room), and this is equivalent to
defining its position by three Cartesian coordinates relative to that reference point. In this
example, there are three degrees of freedom for the location of the notebook because three
variables are required to describe it. Knowing those variables and the reference suffices to
find the notebook. However, if the notebook is on a chair, and all chairs are known to
be the same height, specifying the height conveys no more information than saying that
the notebook is on a chair. Knowing what we do about the chairs, we only need two
additional pieces of information, the X- and Y-coordinates, to specify the location of the
notebook in the room. Thus by specifying the constraint that the notebook is on a chair
of fixed height, we have removed one of the three degrees of freedom.