significance tests for normality, but these are generally not helpful, partly
because they depend on the number of data and partly because they do not tell
you in what way a distribution departs from normal. We illustrate this
weakness below. You can try fitting theoretical dist ributions from the estimated
parameters of the dist ribution to the histogram. If the histogram appears erratic
then another way of examining the data for normality is to compute the
cumulative distribution and plot it against the normal probability on normal
probability paper. This paper has an ordinate scaled in such a way that a
normal cumulative dist ribution appears as a straight line. Alternatively, you
can compute the normal equivalent deviate for probability p; this is the value of
z to the left of which on the graph the area under the standard normal curve is
p. A strong deviation from the line indicates non-normality, and you can try
drawing the cumulative distributions of transformed data to see which gives a
reasonable fit to the line before deciding whether to transform and, if so, in
what way.
To illustrate these effects we turn to the dist ribution of potassium at Broom’s
Barn Farm. The data are from an original study by Webster and McBratney
(1987). The distribution is shown as a histogram of the measured values in
Figure 2.1(a). To it is fitted the curve of the lognormal distribution with
parameters as given in Table 2.1. It is positively skewed. The histogram of
the logarithms is shown in Figure 2.1(b). It is approximately symmetric, the
normal pdf fits well, and transforming to logarithms has approximately normal-
ized the data. Figure 2.2 shows the corresponding box-plots, as ‘box and
whisker’ plots in which the limits of the boxes enclose the interquartile ranges
and the whiskers extend to the limits of the data, Figure 2.2(a)–(b). In
Figure 2.2(c)–(d) the whiskers extend only to ‘fences’, and any points lying
beyond them are plotted individually. The upper fence is the lim it of the upp er
quartile plus 1.5 times the interqu artile range or the maximum if that is
Table 2.1 Summary statistics for exchangeable potassium (K, mg l
1
) at Broom’s Barn
Farm.
Klog
10
K
Minimum 12.0 1.0792
Maximum 96.0 1.9823
Mean 26.31 1.3985
Median 25.0 1.3979
Standard deviation 9.039 0.1342
Variance 81.706 0.01800
Skewness 2.04 0.39
Kurtosis 9.51 0.57
Number of observations 434 434
x
2
for normal fit (with 18 degrees of freedom) 174.4 43.6
Exploratory Data Analysis and Display 23