
Data-Driven Modeling of Mineral Prospectivity 265
every deposit-type, proxy deposit-type and non-deposit location. By modeling a
mathematical relationship between a set of mineral occurrence scores (
Y
i
) and multiple
sets of
MOFS
ji
of spatial data at deposit-type, proxy deposit-type and non-deposit
locations, weak dissimilarities in multivariate spatial data signatures of deposit-type
locations and weak to moderate dissimilarities in multivariate spatial data signatures of
proxy deposit-type locations, as indicated in Fig. 8-6, can be enhanced. This allows
distinction between coherent and non-coherent deposit-type locations and between
coherent and non-coherent proxy deposit-type locations. Then a threshold
Ǔ
i
can be
sought to distinguish between coherent and non-coherent deposit-type locations and
between coherent and non-coherent proxy deposit-type locations.
Because the mineral occurrence score,
Y
i
, is a dichotomous variable, logistic
regression is appropriate in modeling the relationship between
Y
i
and MOFS
ji
in order to
derive
Ǔ
i
in the unit range [0,1], viz. (Rock, 1988a; Hosmer and Lemeshow, 2000):
]1[1
ˆ
)(
0 mnmjij
MOFSbMOFSbb
i
eY
+++−
+=
"
(8.3)
where
b
0
is a constant and b
j
is the coefficient of the j
th
(j=1,2,…,m) MOFS
ji
independent
variable. In logistic regression, the relationship between the dependent and independent
variables is not a linear function. Data of independent variables used in logistic
regression can be of any form; they can be dichotomous, nominal, interval or ratio
variables (Hosmer and Lemeshow, 2000). Logistic regression makes no assumption
about the distribution of data of independent variables; they do not have to be normally
distributed, linearly related or of equal variance. However, for any of the
i
th
(i=1,2,…,n)
cases (e.g., deposit-type or non-deposit locations) with missing values for at least one of
the
j
th
(j=1,2,…,m) independent variables (in this case MOFS
ji
for the geochemical data;
see Fig. 5-12), it is very difficult, if not impossible, to estimate
Ǔ
i
. Current solutions to
the problem of missing data of independent variables in logistic regression are still
somewhat controversial and not yet routine (Rubin, 1996; Allison, 2002; Paul et al.,
2003). For the case study, deposit-type and non-deposit locations without geochemical
data are simply assigned a
MOFS of [0].
The logistic regression coefficients (
b
j
) of the j
th
(j=1,2,…,m) MOFS
ji
independent
variable are determined via the maximum likelihood method (Cox and Snell, 1989),
whereby the square of the difference between
Y
i
and Ǔ
i
is minimised and tested for
goodness-of-fit (e.g., via the Hosmer-Lemeshow test (Hosmer and Lemeshow, 2000)).
Because the relationship between independent and dependent variables is not a linear
function in logistic regression, the coefficients
b
j
may not have straightforward
interpretations as they do in ordinary linear regression (Rock, 1988a). Thus, it is
imperative to test the statistical significance of logistic regression coefficients (e.g.,
using the Wald statistic (Menard (2001)). In addition, a backward stepwise logistic
regression is instructive in eliminating independent variables that do not contribute
significantly (e.g., at the 90% level) to the logistic regression.