elaborate on each technique, only factor analysis is described here, because in many
situations it provides the most useful and relevant information for biogeochemical
datasets.
The starting point for factor analysis is a spreadsheet of data (e.g., Excel) with
the data all in a numeric (rather than text) format. The spreadsheet is imported into
the statistical analysis software programme and ‘factor analysis’ is selected from the
drop-down menu. The range of elements (with any quantifiable field parameters) to
be incorporated into the factor analysis is selected and the command ‘extract’ is
given. There is then a cho ice of methods that can be made, of whi ch the default is
usually ‘Principal Components’ and the defaul t matrix for analysis is the correlation
matrix. In factor analysis, vector ‘eigenvalues’ are calculated and for most purposes a
value of one is an appropriate cut-off level. The programme needs to know how
many iterations of calculations are required to generate the factor matrix. The
default value of 25 is usually adequate, but for large datasets a higher value may be
required to bring convergence to the dataset.
Factor analysis is used to explain variability among observed random variables in
terms of fewer unobserved random variables called factors. The observed variables
are linear combinations of the factors, plus error terms, and help to provide insight
to the structure of large amounts of data. The process of factor analysis can be
visualized as examining a dataset in n-dimensional space and inserting axes (typically
orthogonal) into the data matrix to optimize the fit and thereby explain the variance
in the data. This generates coordinates (from 1 to +1) with loadings according to
the strength of correlation. The first iteration extracts the factor that explains the
greatest propo rtion of the data variability, expressed as a percentage and represents
a list of coordinates in factor (or component) 1. Once this variability has been
extracted from the dataset, a new calculation is automatically generated to extract
the combination of elements that accounts for the next largest percentage of the data
variability. Commonly, about 7–10 factors will be extracted by this technique and
these usually account for 70–90% of the data variability. The sort of information
gleaned from this process may be isolation of elements related to plant structure (e.g.
Ca, K, Zn, Cu) from those related to plant nutrition (e.g., B, Mo, P), and perhaps
data related to mineralization (e.g., Au, As, Sb, Bi, Ag, Hg, U), a carbonate-rich
substrate (Ca, Sr, Ba, Mg), or mafic bedrock (Ni, Co, Cr, Mg). There is commonly,
too, a significant factor involving Fe and associated elements such as REE, Al, Hf,
Sc, Ti and Hg.
It should be appreciated that results from factor analysis should not be considered
as a definitive solution. The data that are entered are invariably of mixed quality with
respect to analytical precision, because the data for some elements are considerably
more precise than data for others. The uncertainty surrounding results from a factor
analysis becomes evident if the user generates several rotated factor solutions using
first a complete array of elements, then extracting some of those elements and
re-calculating the factors for the remaining matrix. Commonly, there may be some
337
Biogeochemistry in Mineral Exploration