
60
PART I
✦
The Linear Regression Model
4.3.5 THE GAUSS–MARKOV THEOREM
We will now obtain a general result for the class of linear unbiased estimators of β.
THEOREM 4.2
Gauss–Markov Theorem
In the linear regression model with regressor matrix X, the least squares estimator
b is the minimum variance linear unbiased estimator of β. For any vector of con-
stants w, the minimum variance linear unbiased estimator of w
β in the regression
model is w
b, where b is the least squares estimator.
Note that the theorem makes no use of Assumption A6, normality of the distribution
of the disturbances. Only A1 to A4 are necessary. A direct approach to proving this
important theorem would be to define the class of linear and unbiased estimators (b
L
=
Cy such that E[b
L
|X] = β) and then find the member of that class that has the smallest
variance. We will use an indirect method instead. We have already established that b is
a linear unbiased estimator. We will now consider other linear unbiased estimators of
β and show that any other such estimator has a larger variance.
Let b
0
= Cy be another linear unbiased estimator of β, where C is a K × n matrix.
If b
0
is unbiased, then
E [Cy |X] = E [(CXβ + Cε) |X] = β,
which implies that CX = I. There are many candidates. For example, consider using
just the first K (or, any K) linearly independent rows of X. Then C = [X
−1
0
: 0], where
X
−1
0
is the inverse of the matrix formed from the K rows of X. The covariance matrix of
b
0
can be found by replacing (X
X)
−1
X
with C in (4-14); the result is Var[b
0
|X] =
σ
2
CC
. Now let D = C −(X
X)
−1
X
so Dy = b
0
− b. Then,
Var[b
0
|X] = σ
2
[(D + (X
X)
−1
X
)(D + (X
X)
−1
X
)
].
We know that CX = I = DX +(X
X)
−1
(X
X),soDX must equal 0. Therefore,
Var[b
0
|X] = σ
2
(X
X)
−1
+ σ
2
DD
= Var[b |X] + σ
2
DD
.
Since a quadratic form in DD
is q
DD
q = z
z ≥ 0, the conditional covariance matrix
of b
0
equals that of b plus a nonnegative definite matrix. Therefore, every quadratic
form in Var[b
0
|X] is larger than the corresponding quadratic form in Var[b |X], which
establishes the first result.
The proof of the second statement follows from the previous derivation, since the
variance of w
b is a quadratic form in Var[b |X], and likewise for any b
0
and proves that
each individual slope estimator b
k
is the best linear unbiased estimator of β
k
. (Let w be all
zeros except for a one in the kth position.) The theorem is much broader than this, how-
ever, since the result also applies to every other linear combination of the elements of β.
4.3.6 THE IMPLICATIONS OF STOCHASTIC REGRESSORS
The preceding analysis is done conditionally on the observed data. A convenient method
of obtaining the unconditional statistical properties of b is to obtain the desired results
conditioned on X first and then find the unconditional result by “averaging” (e.g., by