Statistics and Data Analysis in Geology
-
Chapter
6
of one of the techniques. These methods are well described in some
of
the texts
listed in the Selected Readings at the end of the chapter, especially
in
Marascuilo
and
Levin (1983) and in Draper and Smith (1998).
The backward elimination procedure consists of computing
a
regression in-
cluding all possible variables and selecting the least significant variable. The selec-
tion proceeds by examining the standardized partial regression coefficients for the
smallest value and then recomputing the regression, omitting that variable. The
significance of the deleted variable
is
tested by the analysis of variance shown
in
Table
6-3.
If
the variable
is
not making a significant contribution to the regres-
sion, it
is
permanently discarded. The reduced regression model is then fitted to
the data, a new set of standardized partial regression coefficients
for
the reduced
equation
is
calculated, and the process is repeated. At each step, the regression
equation is reduced by one variable, until all remaining variables are significant.
It is instructive to examine the collection of
six
independent variables mea-
sured on river basins (file KENTUCKY.TXT) and see if any can be discarded without
significantly affecting the multiple regression on basin magnitude. We
can
find a
minimal set of regressions by examining the standardized partial regression coeffi-
cients, deleting the smallest of these, and recomputing the regression. Repeatedly
running a multiple-regression program obviously is less efficient than using a step-
wise computer program, but it has the advantage that every step
in
the process can
be examined closely. When you are confident that you understand the elimination
process and the changes that occur in the regression coefficients, you may turn to
a more automated procedure.
Although multiple regression
is
“multivariate”
in
the sense that more than one
variable
is
measured on each observational unit, it really is a univariate technique
because we are concerned only with the variance of one variable,
y.
Behavior of
the independent variables, the
x’s,
is not subject to analysis.
The next topic we will consider
is
discriminant function analysis, which
in-
volves identification
or
the placing of objects into predefined groups. The discrim-
ination between two alternative groups
is
a process that
is
computationally inter-
mediate between univariate procedures and true multivariate methods in which
many variables are considered simultaneously. Two groups, each characterized by
a set of multiple variables, can be discriminated by solving a set of simultaneous
equations almost identical to those involved
in
multiple regression. The right-hand
vector of the matrix equation, however, does not contain cross products between
independent variables and a single dependent variable, but rather differences be-
tween the multivariate means
of
the two groups that are to be discriminated.
Tests of discriminant functions involve multivariate extensions of simple
uni-
variate statistical tests of equality. These
will
be considered next, followed by a
dis-
cussion of multivariate classification,
or
the sorting of objects into homogeneous
groups. We will then consider eigenvector techniques, including principal compo-
nent and factor analysis. The final topics will include multivariate extensions of
discriminant analysis and multiple regression.
This list of topics is certainly not all-inclusive. However, the subjects have been
chosen because they have found special utility in the Earth sciences. They include a
wide variety of computational techniques and encompass many fundamental con-
cepts.
An
understanding of the theory and operational procedures involved in
these methods should provide you with a sufficient background to evaluate other
multivariate techniques as well.
470