2.6 Multidimensional Samples: Fisher’s Iris Data and Body Fat Data 29
it does not affect the analysis. For example, the vector (2.1314757, 4.9956301,
6.1912772) could probably be simplified to (2.14, 5, 6.19); (ii) Organize the
numbers to compare columns rather than rows; and (iii) The user’s cognitive
load should be minimized by spacing and table lay-out so that the eye does not
travel long in making comparisons.
Fisher’s Iris Data. An example of multivariate data is provided by the cel-
ebrated Fisher’s iris data. Plants of the family Iridaceae grow on every conti-
nent except Antarctica. With a wealth of species, identification is not simple.
Even iris experts sometimes disagree about how some flowers should be classi-
fied. Fisher’s (Anderson, 1935; Fisher, 1936) data set contains measurements
on three North American species of iris: Iris setosa canadensis, Iris versicolor,
and Iris virginica (Fig. 2.8a-c). The 4-dimensional measurements on each of
the species consist of sepal and petal length and width.
(a) (b) (c)
Fig. 2.8 (a) Iris setosa, C. Hensler, The Rock Garden, (b) Iris virginica, and (c) Iris versicolor,
(b) and (c) are photos by D. Kramb, SIGNA.
The data set fisheriris is part of the MATLAB distribution and contains
two files:
meas and species. The meas file, shown in Fig. 2.9a, is a 150 ×4
matrix and contains 150 entries, 50 for each species. Each row in the matrix
meas contains four elements: sepal length, sepal width, petal length, and petal
width. Note that the convention in MATLAB is to store variables as columns
and observations as rows.
The data set
species contains names of species for the 150 measurements.
The following MATLAB commands plot the data and compare sepal lengths
among the three species.
load fisheriris
s1 = meas(1:50, 1); %setosa, sepal length
s2 = meas(51:100, 1); %versicolor, sepal length
s3 = meas(101:150, 1); %virginica, sepal length
s = [s1 s2 s3];
figure;
imagesc(meas)