
The sample variance and the sample standard deviation are perfectly ac-
curate for describing a sample, but their formulas are not designed for estimating
the population. To accurately estimate a population, we should have a sample of ran-
dom scores, so here we need a sample of random deviations. Yet, when we measure
the variability of a sample, we use the mean as our reference point, so we encounter
the restriction that the sum of the deviations must equal zero. Because of this, not
all deviations in the sample are “free” to be random and to reflect the variability
found in the population. For example, say that the mean of five scores is 6 and that four
of the scores are 1, 5, 7, and 9. Their deviations are , , 1, and 3, so the sum
of their deviations is . Therefore, the final score must be 8, because it must have a
deviation of 2 so that the sum of all deviations is zero. Thus, the deviation for this
score is determined by the other scores and is not a random deviation that reflects
the variability found in the population. Instead, only the deviations produced by the
four scores of 1, 5, 7, and 9 reflect the variability found in the population. The same
would be true for any four of the five scores. Thus, in general, out of the N scores in
a sample, only N 1 of them (the N of the sample minus 1) actually reflect the vari-
ability in the population.
The problem with the biased estimators ( and ) is that these formulas divide by
. Because we divide by too large a number, the answer tends to be too small. Instead,
we should divide by 1. By doing so, we compute the unbiased estimators of the
population variance and standard deviation. The definitional formulas for the unbiased
estimators of the population variance and standard deviation are
Estimated Population Variance Estimated Population Standard Deviation
Notice we can call them the estimated population standard deviation and the esti-
mated population variance. These formulas are almost the same as the previous
defining formulas that we used with samples: The standard deviation is again the
square root of the variance, and in both the core computation is to determine the
amount each score deviates from the mean and then compute something like an “aver-
age” deviation. The only novelty here is that, when calculating the estimated popula-
tion standard deviation or variance, the final division involves .
The symbol for the unbiased estimator of the standard deviation is the lowercase ,
and the symbol for the unbiased estimator of the variance is the lowercase . To keep
all of your symbols straight, remember that the symbols for the sample involve the cap-
ital or big , and in those formulas you divide by the “big” value of . The symbols for
estimates of the population involve the lowercase or small s, and here you divide by the
smaller quantity, 1. Further, the small is used to estimate the small Greek called
. Finally, think of and as the inferential variance and the inferential standard de-
viation, because the only time you use them is to infer the variance or standard devia-
tion of the population based on a sample. Think of and as the descriptive variance
and standard deviation because they are used to describe the sample.
REMEMBER and describe the variability in a sample; and esti-
mate the variability in the population.
For future reference, the quantity is called the degrees of freedom. The degrees
of freedom is the number of scores in a sample that are free to reflect the variability in
the population. The symbol for degrees of freedom is df, so here .df 5 N – 1
N – 1
s
X
s
2
X
S
X
S
2
X
S
X
S
2
X
s
X
s
2
X
σ
ssN
NS
s
2
X
s
X
N – 1
s
X
5
B
Σ1X – X2
2
N – 1
s
2
X
5
Σ1X – X2
2
N – 1
N
N
S
2
X
S
X
22
2125
1S
X
21S
2
X
2
The Population Variance and the Population Standard Deviation 97