SEDIMENT YIELD FROM WATERSHEDS 7.27
However, a regression equation will minimize the sum of the squared deviations from the
log-transformed data, which is not the same as minimizing the sum of the squared
deviations from the original dataset, and this introduces a bias that underestimates the
concentration (or load) at any discharge. To illustrate this effect, consider the values of 10
and 90, which have a mean value of 50. If these values are log-transformed, their logs
averaged, and the antilog of the average back-transformed, the resulting mean is 30.
Because the geometric mean computed by using the log transforms will necessarily be
lower than the arithmetic mean, the result is a negative bias, the magnitude of which
increases with the degree of scatter about the regression. Ferguson (1986) reports that this
bias may result in underestimation by as much as 50 percent. Ferguson and others have
suggested bias correction factors, but their appropriateness is uncertain (Glysson, 1987;
Walling and Webb, 1988).
Alternatives to the lognormal transformation are now widely available with
microcomputer technology. McCuen (1993) provides a description of several alternative
procedures plus software diskettes containing programs and sample datasets, which allow
the power model to be fit so that the error term is minimized on the basis of the original
unlogged dataset. Another method for testing the goodness of fit of different models
against the original dataset of instantaneous concentration-discharge was used by Jansson
(1992a) at Cachí reservoir. In this case the total load for the original dataset was
computed by assigning an arbitrary duration interval to each sample point. The total load
was then recomputed by each model (regression, visual fit, etc.) using all the discharge
values contained in the original dataset, to see how accurately each model reflected the
total load for the sampled period.
A particular weakness of a mathematically fitted curve is the potentially poor fit at the
high extreme, which will be represented by few datapoints. Large errors can occur when
mathematical curves (e.g., a log-quadratic) are extrapolated to discharge values greater
than those covered in the original dataset, producing unreasonable values. All rating
curves should be plotted and examined for reasonableness over the entire range of
discharge values to which they will be applied.
The infrequent large-discharge events which account for most sediment load will
constitute only a few points in the entire sediment dataset. As a result, the shape of a
regression equation fit using the original dataset will be biased by the numerous data-
points at low discharge values, which account for a small percentage of the sediment
load. This problem can be overcome by dividing the data into discharge classes,
computing the mean sediment concentration within each discharge class, then running the
regression model using the means. This technique equally weighs the error minimization
scheme over the entire range of the dataset.
7.4.4 Rating Curve Example
As an example, the 8-year C-Q record at 21 km
2
Goodwin Creek, Mississippi, was
divided into 0.076-m (0.25-ft) stage intervals using discharge and suspended sediment
data pairs from a gaging flume equipped with a pumped sampler. The dataset was divided
between the fine and sand fractions, and the mean suspended sediment concentration for
each fraction was determined for each stage interval to produce the plot in Fig. 7.20,
which illustrates the clearly separate relationships for sand and fines. Although the
resulting relationship appears to imply a high degree of correlation between discharge
and concentration, the original dataset displays considerable scatter. The standard
deviation of concentration within a stage interval is about as large as the average
concentration itself. Application of the averaged data may accurately determine the long-
term load, but estimates for individual storms would incorporate large errors. In this