232 Random and systematic errors
16.3.3 Averaging data when χ
2
red
1
When the variation in a sample is mainly due to environmental effects,
such as crystal-packing effects on bond distances, the mean should be
calculated using (16.1). The standard deviation, σ(sample), should be
calculated using (16.2), i.e. the sample standard deviation should be
quoted, not the standard deviation on the mean. Taylor and Kennard
(1983) argue that, if each measurement has its own standard deviation,
it is better to estimate the standard deviation using
σ
2
= σ
2
(sample) −
_______
σ
2
(x
i
), (16.15)
though the second term (the average variance of the measurements) is
usually so much smaller than the first that it makes little difference.
The C=N bond distances in adenine derivatives, for example, appear
to be rather insensitive to crystal-packing forces, and this may be
described as a ‘hard’ geometrical parameter. Other parameters, such as
metal–metal bond lengths in clusters, bond angles, torsion angles, and
intermolecular contact distances, are much more variable and subject to
environmental effects. Such parameters may be described as ‘soft’. It is
important to remember that an average value is meaningless for a set
of parameters that are not really equivalent (i.e. they do not belong to
the same normal distribution). Even for bonds that appear to be chemi-
cally similar, statistical equivalence may not be found. In such cases, it
is better to quote a range of values, but if you feel driven to calculate
an average anyway, use (16.1) for the mean, and σ (sample) (16.2) for its
standard deviation.
16.4 Weighting schemes
Weighting schemes occur throughout crystallographic calculations,
such as merging of data, least-squares refinement, and analysis of
results. In the following section we will discuss the use of weighting
schemes in refinement.
It may seem odd to start discussing least-squares refinement in a
chapter on statistics and errors. However, least squares is one form of
estimation, a technical term used in statistics to refer to the derivation of
numerical quantites from a sample set of data. The mean and standard
deviation are two estimators, and when, for example, intensity data are
merged using (16.3), we are deriving an estimate (using the word in its
technical sense) of the intensity of a reflection given a sample set of data.
Least squares is the most important estimation procedure in physical
science.In least squareswe estimate numerical values of parameters from
a dataset, including co-ordinates, displacement parameters, standard
uncertainties etc.). We minimize a quantity
χ
2
=
w
i
(Y
o
− Y
c
)
2
=
w
2
, (16.16)