Analysis
of
Multivariate Data
be relatively easy.
As
an
exercise, it may be instructive to calculate the significance
of the discriminant function for the example we have just worked.
Not all of the variables we have included in the discriminant function
will
be
equally useful in distinguishing one group from another.
We
may wish to iso-
late those variables that
are
not especially helpful and eliminate them from future
analyses. Selecting the most effective set of discriminators for discriminant func-
tion analysis would seem to be analogous to selecting the most efficient predictors
in
multiple regression. The problem, however, is more complicated because the
“dependent”
or
predicted variable in a discriminant function is composed of
dif-
ferences between
two
sets
of
the same variables that
are
used
as
“independent”
predictors of the discrimination. Unlike regression, where the sums of squares
of
y
do not change as different variables
Xj
are
added to the equation, the
sums
of squares of the differences between groups
A
and
B
do change as variables are
added or deleted.
Some idea of the effectiveness of the variables as discriminators can be gained
by computing the
standardized differences,
(6.28)
This is simply the difference between the means of the two groups
A
and
B
for
variable
j,
divided by the pooled standard deviation of variable
j.
Since the mea-
sure does not consider interactions between variables, it is useful only as a general
guide to discriminating power. Stepwise discriminant analysis programs may use
standardized differences
in
choosing the order in which variables are added to the
discriminant function. Marascuilo and
Levin
(1983)
discuss “after-the-fact” con-
trast procedures that can be used to select the most important variables. However,
the significance of different combinations of variables can be tested only by com-
puting the various functions and determining the relative amounts of separation
the different equations produce between the two groups. To avoid bias, such tests
should be run on independent random samples.
Discriminant function analysis provides
a
natural transition between two major
classes of multivariate statistical techniques.
On
one hand, it is closely related to
multiple regression and trend-surface analysis. On the other, it can be expressed
as an eigenvalue problem, related to principal component analysis, factor analysis,
and similar multivariate methods. There are advantages to the use of eigenvectors
in calculating the discriminant function, because they allow us to simultaneously
discriminate between more than two groups. However, we will delay a consideration
of this topic until we examine the basic elements of eigenvector analysis and some
of the simpler eigenvector techniques.
Multivariate Extensions
of
Elementary Statistics
In Chapter
2,
we considered some simple geologic problems that could be examined
by elementary statistical methods.
We
will begin our consideration
of
multivariate
methods in geology with some direct extensions of these simple tests.
You
will
recall that the variation measured in most naturally occurring phenomena could be
described by the normal distribution. This is a reflection of the central limit theo-
rem, which states that observations which are the
sums
of
many
independently op-
erating processes tend to be normally distributed as the number of effects becomes
479