the semivariances are less reliable. Skewness can result from a long upper or
lower tail in the underlying process or from the presence of a secondary process
that contaminates the primary process—values from the latter may appe ar as
outliers. Kerry and Oliver (2007a, 2007b) have studied the effects of asym-
metry in the underlying process and outliers on the variogram using simulated
fields. We summarize their results below.
Methods of estimating variograms reliably from skewed data have been
sought, and it is clear that the cause of asymmetry affects what one should
do. If the skewness coefficient exceeds the bounds given above then the
histogram or box-plot should be exam ined to reveal the deta il of the asymmetry.
In addition to these usual graphical methods, you can identify exceptional
contributions to the semivariances by drawing an h-scattergram for a given
lag, h. As described in Chapter 4, an h-scattergra m is a graph in which the zðxÞ
are plotted against the zðx þ hÞ with which they are compared in computing
^
gðhÞ. In general, the plotted points appear as more or less inflated clusters, as in
the usual kind of scatter graph.
Underlying asymmetry or skewness
Where asymmetry arises from a long tail, especially a long upper tail, in the
distribution ‘standard best practice’ has been to transform the data, as described
in Chapter 2. The variogram is then computed on the transformed data.
Transformation is not essential, however; the variogram computed from the
original data and predictions using it are unbiased, though they are not
necessarily the most precise. Perhaps more surprising is that the characteristic
form of the variogram may be changed little by transformation. So, you should
examine the experimental variograms of both raw and transformed data before
deciding which to work with.
Kerry and Oliver (2007a) explored the effects of varying skewness and sample
size, and of different transformations on random fields created by simulated
annealing (see Chapter 12 for a description of the method). They simulated
values on a square 5-m grid of 1600 points from a spherical function (equation
(5.24)), with a range, a, of 75 m, a total sill variance, c
0
þ c, of 1, and
nugget:sill ratios of 0, 0.25, 0.5, 0.75 and 1. They simulated similar fields of
400 points and 100 points with grid intervals of 10 m and 20 m, respectively.
Values in the fields were raised to a power a to create a long upper tail in the
distribution. Five values of a were used to give skewness coefficients, g
1
, of 0.5,
1.0, 1.5, 2.0 and 5.0. Here we illustrate what can happen with their results for
a ¼ 75 m, c
0
¼ 0 and c ¼ 1.
Figure 6.1 shows the h-scattergrams at lag 10 m (lag 1) from four fields
simulated on a 10 m grid. Each field has a unique coefficient of skewness,
g
1
¼ 0; 1:0; 1:5 and 2.0, caused by underlying asymmetry. The scatter of points
for the normal distribution is clustered fairly tightly along the diagonal line in
Figure 6.1(a). As the coefficient of skewness increases, the scatter becomes more
110 Reliability of the Experimental Variogram and Nested Sampling