You will recognize the similarity between this example and those
presented in Section 4.3, which dealt with the binomial distribution. The
variable obesity is a dichotomous variable, since an individual can be clas-
sified into one or the other of two mutually exclusive categories obese or
not obese. In Section 4.3, we were given similar information and were asked
to find the number with the characteristic of interest, whereas here we are
seeking the proportion in the sample possessing the characteristic of inter-
est. We could with a sufficiently large table of binomial probabilities, such
as Table B, determine the probability associated with the number correspon-
ding to the proportion of interest. As we will see, this will not be neces-
sary, since there is available an alternative procedure, when sample sizes
are large, that is generally more convenient.
■
Sampling Distribution of : Construction The sampling distribution
of a sample proportion would be constructed experimentally in exactly the same man-
ner as was suggested in the case of the arithmetic mean and the difference between two
means. From the population, which we assume to be finite, we would take all possible
samples of a given size and for each sample compute the sample proportion, . We would
then prepare a frequency distribution of by listing the different distinct values of
along with their frequencies of occurrence. This frequency distribution (as well as the
corresponding relative frequency distribution) would constitute the sampling distribution
of .
Sampling Distribution of : Characteristics When the sample size
is large, the distribution of sample proportions is approximately normally distributed by
virtue of the central limit theorem. The mean of the distribution, that is, the aver-
age of all the possible sample proportions, will be equal to the true population propor-
tion, p, and the variance of the distribution, will be equal to or
where To answer probability questions about p, then, we use the following
formula:
(5.5.1)
The question that now arises is, How large does the sample size have to be for the
use of the normal approximation to be valid? A widely used criterion is that both np and
must be greater than 5, and we will abide by that rule in this text.
We are now in a position to answer the question regarding obesity in the sample of
150 individuals from a population in which 31 percent are obese. Since both np and
are greater than and we can say
that, in this case, is approximately normally distributed with a mean and
The probability we seek is the areas
2
p
N
= p11 - p2>n = 1.3121.692>150 = .001426.
m
pN
, = p = .31p
N
150 * .69 = 103.52,51150 * .31 = 46.5n11 - p2
n11 - p2
z =
p
N
- p
A
p11 - p2
n
q = 1 - p.
pq>n,p11 - p2>ns
2
pN
,
m
pN
,
p
n
p
N
p
N
p
N
p
N
p
n
152 CHAPTER 5 SOME IMPORTANT SAMPLING DISTRIBUTIONS