which is a random variable because it depends on the outcome of the random sample
{Y
1
,Y
2
,…,Y
n
}. The maximum likelihood estimator of u, call it W, is the value of u that
maximizes the likelihood function. (This is why we write L as a function of u,followed
by the random sample.) Clearly, this value depends on the random sample. The maximum
likelihood principle says that, out of all the possible values for u, the value that makes the
likelihood of the observed data largest should be chosen. Intuitively, this is a reasonable
approach to estimating u.
Usually, it is more convenient to work with the log-likelihood function,which is
obtained by taking the natural log of the likelihood function:
log [L(u; Y
1
,...,Y
n
)]
n
i1
log [ f(Y
i
; u)], (C.16)
where we use the fact that the log of the product is the sum of the logs. Because (C.16)
is the sum of independent, identically distributed random variables, analyzing estimators
that come from (C.16) is relatively easy.
Maximum likelihood estimation (MLE) is usually consistent and sometimes unbiased.
But so are many other estimators. The widespread appeal of MLE is that it is generally
the most asymptotically efficient estimator when the population model f(y;u) is correctly
specified. In addition, the MLE is sometimes the minimum variance unbiased estima-
tor; that is, it has the smallest variance among all unbiased estimators of u. (See Larsen
and Marx [1986, Chapter 5] for verification of these claims.)
In Chapter 17, we will need maximum likelihood to estimate the parameters of more
advanced econometric models. In econometrics, we are almost always interested in the
distribution of Y conditional on a set of explanatory variables, say, X
1
,X
2
,...,X
k
. Then, we
replace the density in (C.16) with f(Y
i
X
i1
,...,X
ik
; u
1
,...,u
p
), where this density is allowed
to depend on p parameters, u
1
,...,u
p
. Fortunately, for successful application of maximum
likelihood methods, we do not need to delve much into the computational issues or the
large-sample statistical theory. Wooldridge (2002, Chapter 13) covers the theory of
maximum likelihood estimation.
Least Squares
A third kind of estimator, and one that plays a major role throughout the text, is called a
least squares estimator. We have already seen an example of least squares: the sample
mean, Y
¯
, is a least squares estimator of the population mean, m. We already know Y
¯
is a
method of moments estimator. What makes it a least squares estimator? It can be shown
that the value of m that makes the sum of squared deviations
n
i1
(Y
i
m)
2
as small as possible is m Y
¯
. Showing this is not difficult, but we omit the algebra.
For some important distributions, including the normal and the Bernoulli, the sam-
ple average Y
¯
is also the maximum likelihood estimator of the population mean m. Thus,
the principles of least squares, method of moments, and maximum likelihood often
result in the same estimator. In other cases, the estimators are similar but not identical.
Appendix C Fundamentals of Mathematical Statistics 779