10 2 The Sample and Its Properties
examples of descriptive statistics. Rather than focusing on the population
using information from a sample, which is a staple of statistics, descriptive
statistics is concerned with the description, summary, and presentation of the
sample itself. For example, numerical summaries of a sample could be mea-
sures of location (mean, median, percentiles, mode, extrema), measures of
variability (sample standard deviation/variance, robust versions of the vari-
ance, range of data, interquartile range, etc.), higher-order statistics (kth mo-
ments, kth central moments, skewness, kurtosis), and functions of descriptors
(coefficient of variation). Graphical summaries of samples involve various vi-
sual presentations such as box-and-whisker plots, pie charts, histograms, em-
pirical cumulative distribution functions, etc. Many basic data descriptors are
used in everyday data manipulation.
Ultimately, exploratory data analysis and descriptive statistics contribute
to the principal goal of statistics – inference about population descriptors – by
guiding how the statistical models should be set.
It is important to note that descriptive statistics and exploratory data
analysis have recently regained importance due to ever increasing sizes of
data sets. Some complex data structures require several terrabytes of memory
just to be stored. Thus, preprocessing, summarizing, and dimension-reduction
steps are needed to prepare such data for inferential tasks such as classifi-
cation, estimation, and testing. Consequently, the inference is placed on data
summaries (descriptors, features) rather than the raw data themselves.
Many data managing software programs have elaborate numerical and
graphical capabilities. MATLAB provides an excellent environment for data
manipulation and presentation with superb handling of data structures and
graphics. In this chapter we intertwine some basic descriptive statistics with
MATLAB programming using data obtained from real-life research laborato-
ries. Most of the statistics are already built-in; for some we will make a custom
code in the form of m-functions or m-scripts.
This chapter establishes two goals: (i) to help you gently relearn and re-
fresh your MATLAB programming skills through annotated sessions while, at
the same time, (ii) introducing some basic statistical measures, many of which
should already be familiar to you. Many of the statistical summaries will be
revisited later in the book in the context of inference. You are encouraged to
continuously consult MATLAB’s online help pages for support since many pro-
gramming details and command options are omitted in this text.
2.2 A MATLAB Session on Univariate Descriptive
Statistics
In this section we will analyze data derived from an experiment, step by step
with a brief explanation of the MATLAB commands used. The whole session