16.2 Random errors and distributions 225
The standard deviation can be calculated using either:
σ
2
=
N
w
i
, (16.4)
or
σ
2
=
N
N −1
w
i
(x
i
− x)
2
w
i
. (16.5)
The first is more common, but in the crystallographic intensity data-
merging program SORTAV, for example, where these quantities are
referred to as σ
2
ext
and σ
2
int
, both are calculated and the larger of the
two taken (Blessing, 1997). Choice of weights, w
i
, has become some-
thing of a subdiscipline of statistics (see Section 16.4), but a common
choice when averaging a set of measurements x
i
with precision σ(x
i
) is
to use w
i
= 1/σ
2
(x
i
).
Other quantities that may be quoted are the median, mode, skewness
and kurtosis (or curtosis) of the data. The median of a sample of data
values is the middle value of the data set when the values are placed in
ascending order. If the sample size is even, then the median is defined as
being half-way between the two middle values. The median is impor-
tant because it is less sensitive to large outliers than the mean. As an
illustration, suppose the set of measurements was made for a particular
quantity: 0.9, 1.1, 1.2, 1.5, 10.0. The value 10.0 is obviously an outlier (a
mistake). The outlier strongly affects the value of the mean: 2.94 with
the outlier, 1.18 without. The median, by contrast is affected much less:
1.2 with the outlier, 1.15 without. This property is called robustness.
Table 16.1. Statistical
descriptors for the intensities
of the 114 reflection.
Mean, x 1809.9
Sample standard 32.8
deviation, σ
Median 1808.9
Skew −0.39
Kurtosis 3.02
Number of data 67
The mode is the most common value in a set of data, corresponding
to the maximum in a histogram. The sample skewness is a measure of
the symmetry of a distribution, and the kurtosis measures its peakiness.
Formulae are given in statistics text books [e.g. Barlow (1997), p.14].
Values of the mean, sample standard deviation, median, skewness and
kurtosisfor the datainFig. 16.1 aregiven inTable16.1.The negative skew
means that the data tail off to the left; the kurtosis value is interpreted
below.
The mode, skewness and kurtosis seem to be encountered rather
rarely in crystallography. Indeed Barlow (1997) says: Kurtosis is not used
much by physicists, chemists, or indeed anyone else. It is a really obscure and
arcane quantity whose main use is inspiring awe in demonstrators, professors
or anyone else you are trying to impress.
16.2.3 Theoretical distributions
The shape of the histogram in Fig 16.1 can be described using a math-
ematical function called a probability distribution function,orpdf. There
are many such functions, some familiar ones being the binomial, Pois-
son, normal, and uniform distributions. By far the most important in