272 Part 1 Regression Analysis with Cross-Sectional Data
variance of y. The key point is that because both variances in the population R-squared are
unconditional variances, the population R-squared is unaffected by the presence of het-
eroskedasticity in Var(ux
1
,...,x
k
). Further, SSR/n consistently estimates
u
2
, and SST/n con-
sistently estimates
y
2
, whether or not Var(ux
1
,...,x
k
) is constant. The same is true when we
use the degrees of freedom adjustments. Therefore, R
2
and R
–
2
are both consistent estima-
tors of the population R-squared whether or not the homoskedasticity assumption holds.
If heteroskedasticity does not cause bias or inconsistency in the OLS estimators, why
did we introduce it as one of the Gauss-Markov assumptions? Recall from Chapter 3 that
the estimators of the variances,Var(
ˆ
j
), are biased without the homoskedasticity assump-
tion. Since the OLS standard errors are based directly on these variances, they are no longer
valid for constructing confidence intervals and t statistics. The usual OLS t statistics do not
have t distributions in the presence of heteroskedasticity, and the problem is not resolved
by using large sample sizes. We will see this explicitly for the simple regression case in
the next section, where we derive the variance of the OLS slope estimator under
heteroskedasticity and propose a valid estimator in the presence of heteroskedasticity. Sim-
ilarly, F statistics are no longer F distributed, and the LM statistic no longer has an asymp-
totic chi-square distribution. In summary, the statistics we used to test hypotheses under the
Gauss-Markov assumptions are not valid in the presence of heteroskedasticity.
We also know that the Gauss-Markov theorem, which says that OLS is best linear unbi-
ased, relies crucially on the homoskedasticity assumption. If Var(ux) is not constant, OLS
is no longer BLUE. In addition, OLS is no longer asymptotically efficient in the class of
estimators described in Theorem 5.3. As we will see in Section 8.4, it is possible to find
estimators that are more efficient than OLS in the presence of heteroskedasticity (although
it requires knowing the form of the heteroskedasticity). With relatively large sample sizes, it
might not be so important to obtain an efficient estimator. In the next section, we show how
the usual OLS test statistics can be modified so that they are valid, at least asymptotically.
8.2 Heteroskedasticity-Robust Inference
after OLS Estimation
Because testing hypotheses is such an important component of any econometric analysis
and the usual OLS inference is generally faulty in the presence of heteroskedasticity, we
must decide if we should entirely abandon OLS. Fortunately, OLS is still useful. In the
last two decades, econometricians have learned how to adjust standard errors and t, F, and
LM statistics so that they are valid in the presence of heteroskedasticity of unknown
form. This is very convenient because it means we can report new statistics that work
regardless of the kind of heteroskedasticity present in the population. The methods in this
section are known as heteroskedasticity-robust procedures because they are valid—at least
in large samples—whether or not the errors have constant variance, and we do not need
to know which is the case.
We begin by sketching how the variances, Var(
ˆ
j
), can be estimated in the presence of
heteroskedasticity. A careful derivation of the theory is well beyond the scope of this text,
but the application of heteroskedasticity-robust methods is very easy now because many
statistics and econometrics packages compute these statistics as an option.