the behavior of the entire population, the derived estimates may be misleading and
lie far from the actual values.
We are specifically interested in understanding and quantifying the variability
observed in biological data. What do we mean by variability in biology? Due to
varying environmental and genetic factors, different individuals respond in a dis-
similar fashion to the same stimulus. Suppose 20 16-year-old boys are asked to run
for 20 minutes at 6 miles/hour. At the end of this exercise, three variables – the
measured heart rate, the blood pressure, and the breathing rate – will vary from one
boy to another. If we were to graph the measured data for each of the three variables,
we would expect to see a distribution of data such that most of the values lie close to
the average value. A few values might be observed to lie further away from the
average. It is important to realize that the differences in any of these measurements
from the average value are not “errors.” In fact, the differences, or “spread,” of the
data about the average value are very important pieces of information that charac-
terize the differences inherent in the population. Even if the environmental condi-
tions are kept constant, variable behavior cannot be avoided. For example, groups
of cells of the same type cultured under identical conditions exhibit different growth
curves. On the other hand, identical genetically engineered plants may bear different
numbers of flowers or fruit, implying that environmental differences such as con-
ditions of soil, light, or water intake are at work.
The methods of collection of numerical data, its analysis, and scientifically sound
and responsible interpretation, for the purposes of generating descriptive numbers
that characterize a population of interest, are collectively prescribed by the field of
statistics, a branch of mathematics. The need for statistical evaluations is pervasive
in all fields of biomedicine and bioengineering. As scientists and engineer s we are
required to calculate and report statistics frequently. The usefulness of published
statistics depends on the methods used to choose a representative (unbiased)
sample, obtain unbiased data from the sample, and analyze and interpret the
data. Poorly made decisions in the data collection phase, or poor judgment used
when interpreting a statistical value, can render its value useless. The statistician
should be well aware of the ethical responsibility that comes with de signing the
experiment that yields the data and the correct interpretation of the data. As part of
the scientific community that collects and utilizes statistica l values on a regular
basis, you are advised to regard all published statistical information with a critical
and discerning eye.
Statistical information can be broadly classified into two categories: descriptive
statistics and inferential statistics. Descriptive statistics are numerical quantities that
condense the information avail able in a set of data or sample into a few numbers,
and exactly specify the characteristics exhibited by the sample, such as average
weight, size, mechanical strength, pressure, or concentration. Inferential statistics
are numerical estimations of the behavior of a population of individuals or items
based on the characteristics of a smaller pool of information, i.e. a representative
sample or data set. Some very useful and commonly employed descriptive statistics
are introduced in Section 3.2. The role of probability in the field of statistics and
some important probability concepts are discus sed in Section 3.3.InSection 3.4,we
discuss two widely used discrete probability distributions in biology: the binomial
distribution and the Poisson distribution. The normal distribution, the most widely
used continuous probability distribution, is covered in detail in Section 3.5, along
with an introduction to inferential statistics. In Section 3.6, we digress a little and
discuss an important topic – how errors or variability associated with measured
142
Probability and statistics