28Amaro Forests - Chap 24 1/8/03 11:53 am Page 285
285 Procedures for Validating Growth and Yield Models
validation. Many forest modellers have adopted such an approach. It is interesting
to note that the reported outcomes from such an approach always appear the same:
the model is good on both the fitting and validation data sets. There is hardly any
reporting of a failed model from such an approach although, in reality, one tends to
find that many models behave undesirably in application.
What is wrong here? The ‘trick’ is the ‘data-splitting scheme’. Because the split
data sets ar
e not independent of each other, the data-splitting scheme used in model
validation is not validating a fitted model; instead, it validates the sampling tech-
nique used to partition the data. Since there is no standard procedure on how the
two portions of the data should be partitioned, various alternatives can be used:
1. The data can be split by 50–50%, 75–25%, 80–20% or any other proportion as the
modeller sees fit.
2. Various sampling techniques can also be used to derive two representative sam-
ple portions.
3. Assuming that the data were to be split 50–50% randomly, the modeller could
r
epeatedly split the data in endless ways to derive the ‘correct’ 50–50% split. There
is a strong possibility that, in practice, this kind of approach can be easily misused
due to its lack of consistency and repeatability. In fact, it may also be ‘manipulated’,
as one can keep sampling until the ‘right’ sample comes up.
4. The sampling proportion is large. Even for an 80–20% split, 20% of the popula-
tion is sampled for model validation. This is a huge per
centage considering that
most of the forest inventories in Alberta and elsewhere sample much less than 1% of
the population. Sampling 50% of the population in a 50–50% split is half way to
‘census’ instead of ‘sampling’. It provides the favourable proportions for mirroring
the population either way, leading to the best possible illusion for model validation.
In addition to data splitting, other procedures might also be used to evaluate
the goodness of model pr
ediction. These include: the conditional mean squared
error of prediction (C
p
), the PRESS statistic, Amemiya’s statistic, various resampling
methods with the funny names of ‘bootstrap’ and ‘jackknife’, and Monte Carlo sim-
ulations (Judge et al., 1988; Dividson and Mackinnon, 1993). All these procedures are
correct in their own right. For instance, the C
p
and Amemiya’s statistic are similar to
other goodness of fit measures. The PRESS statistic is similar to data splitting. The
resampling methods are used when appropriate sampling results are not available
and one needs a non-parametric method of estimating measures of precision (Judge
et al., 1988). Through resampling the estimated errors after a model has been fitted
to the data, some ‘pseudo sample data’ are generated to emulate the modelling data,
which permit the re-fit of the model. Monte Carlo simulations involving the pseudo
sample data are used to approximate the unobservable sampling distributions and
provide information on simulated sampling variability, confidence intervals, biases,
etc. All these procedures can provide some informative statistics and can be of use
for looking at a model from different angles, but their utility in model validation is
quite dubious and not clearly understood, for they are heavily dependent on the
model-fitting data. This dependence is not consistent with the prerequisite of vali-
dating a model on independent data set(s).
While recognizing that data splitting is useful for other purposes (Picard and
Berk, 1990), it was felt that because of the variations and the potential subjectivity
r
elated to data splitting (re-creation), the practice of splitting the data into two parts
should not be used further in validating forestry models, for the reserved data are
not independent of the modelling data and there are numerous ways in which the
data can be chosen to substantiate a modeller’s own objectives and, sometimes, bias.
In some ways, the fact that there is hardly any reporting of a failed model from this