by 839 (i.e. bar(y, f/839)). The information provided in Figure 3.3 is also
conveyed in Table 3.4. Note that the assumption of random sampling, perfect
heterogeneity of the sample, and no bias in the sampling is implicit when a relative
frequency diagram is used to approximate a probability distribution. To calculate
the probability or relative frequency of occurrence of the population variable
within one or more intervals, determine the fraction of the histogram’s total area
that lies between the subset of values that define the interval(s). This method is
very useful when the population variable is continuous and the probability dis-
tribution is approximated by a smooth curve.
The shape of the histogram conveys the nature of the distribution of the observed
variable within the sample. For example, in Figure 3.1 the histogram for age 9
children has a single peak, and the distribution falls off symmetrically about the
peak in either direction. The three histograms for ages 11, 12, and 15 are progres-
sively skewed to the right and do not show a symmetrical distribution about the
peak. What does this tell us? If we are randomly to choose one child from the age 9
population, there is a greater probability that his/her mod erate to vigorous activity
level will be close to the peak value. Also, it is with equal probability that we would
find a child wi th greater activity levels or with lower activity levels than the peak
value. These expectations do not hold for the 11-, 12-, and 15-year-old population
based on the trends shown in the histograms plotted in Figure 3.1. In a skewed
distribution, the peak value is usually not equal to the mean value.
3.4.1 Binomial distribution
Some experiments have only two mutually exclusive outcomes, often defined as
“success” or “failure,” such as, live or dead, male or female, child or adult, HIV+
or HIV−. The outcome that is of interest to us is defined as a “successful event.” For
example, if we randomly ch oose a person from a population to ascertain if the
person is exposed to secondhand smoke on a daily basis, then we can classify two
outcomes: either the person selected from the population is routinely exposed to
secondhand smoke, or the person is not routinely exposed to secondhand smoke. If
the selection process yields an individual who is exposed to secondhand smoke daily,
then the event is termed as a success. If three people are selected from a population,
and all are routinely exposed to secondhand smoke, then this amounts to three
“successes” obtained in a row.
An experiment that produces only two mutually exclusive and exhaustive
outcomes is called a Bernoulli trial, named after Jacques (James) Bernoulli
(1654–1705), a Swiss mathe matician who made significant contributions to the
development of the binomial distribution. For example, a coin toss can only
result in heads or tails. For an unbiased coin and a fair flip, the chance of
getting a head or a tail is 50 : 50 or equally likely. If obtaining a he ad is defined
as the “successful event,” then the probability of success on flipping a coin is 0.5.
If several identical Bernoulli trials are performed in succession, and each
Bernoulli trial is independent of the other, then the experiment is called a
Bernoulli process.
A Bernoulli process is defined by the following three characteristic s:
(1) each trial yields only two events, which are mutually exclusive;
(2) each trial is independent;
(3) the probability of success is the same in each trial.
159
3.4 Discrete probability distributions