of time. Inferential statistics is based solely on probability sampling. It is the only
type of sampling that lends itself to theoretical specification of the sampling distribu-
tions of statistics (discussed below), which form the basis of statistical inference. The
simplest type of probability sample is the simple random sample, in which each mem-
ber of the population has the same chance of being selected into the sample. If n cases
are to be selected from a population of size N, each population member has a proba-
bility of selection of n/N.
Sample statistics such as the sample mean and variance of a variable (y
苶
and s
2
,
respectively) or a sample regression coefficient (b) indicating the effect of a predictor
on an outcome are estimates of corresponding population parameters. Let θ denote
any population parameter, and θ
ˆ
the sample estimator of that parameter. In making
inferences about θ based on the observed value of θ
ˆ
, we need to understand the nature
of the relationship between the two. The sampling distribution of a statistic is critical
to this enterprise: It is a probability distribution for a sample statistic. That is, it is an
enumeration of all possible values of θ
ˆ
, together with their associated probabilities of
occurrence, that would be obtained through an infinite repetition of collecting sam-
ples of size n from that population and recomputing θ
ˆ
. Although we collect only one
sample and compute one value of θ
ˆ
in practice, it is important to understand that the
full distribution of θ
ˆ
could be generated for any statistic via repeated sampling. The
importance of this distribution is that it indicates the probability that θ
ˆ
is within a
specified “distance” from θ. It therefore places bounds on the degree to which we are
in error in using θ
ˆ
as an estimate of θ or in using θ
ˆ
to test a hypothesis about θ.
Table 1A.1 presents a very simple illustration of the sampling distributions for
the sample mean, y
苶
, and the sample variance, s
2
. As is evident in the table, the “pop-
ulation” consists of only five observations: A, B, C, D, and E. (The population is
artificially small to keep the number of different samples manageable.) For each
observation, a value is recorded for the variable Y. The mean of Y, or µ, for this
population is 3 (as is easily verified), and the variance of Y, or σ
2
, is 2. [This is also
easily verified, keeping in mind that for the population, the variance of Y is σ
2
冱(Y µ)
2
/N, where N is the population size.] If we draw samples from this popula-
tion of n 3, without replacement, there are 10 different possible samples that can be
drawn. These are shown in the table along with the Y-values of the sample members
and the sample mean and variance for each sample. The RF columns indicate the rel-
ative frequency of occurrence of each value of the sample mean and variance, respec-
tively. These columns represent the sampling distributions of each statistic, since they
indicate the probabilities associated with each different value of the sample statistics.
For the sample mean, it is clear that when drawing a sample of size 3 from this
population, certain values of y
苶
—such as 3.33, 3, and 2.67—are twice as likely as
other values. Similarly, the most likely value for σ
2
is 2.33. We can also compute the
average of the 10 sample means, denoted E(y
苶
). We find that it is 3, the same as the
population mean of Y. This is no accident, since it is always true that E(y
苶
) µ. This
means that the sample mean is an unbiased estimator of the population mean—its
average value equals the population parameter. The average sample variance, or
E(s
2
), is 2.5, which in this case is not equal to the population variance of 2. However,
under ordinary sampling conditions with infinite (or approximately infinite) popula-
tions, it is the case that E(s
2
) σ
2
. With finite populations we must apply a finite
APPENDIX: STATISTICAL REVIEW 27