
5.3
A
Priori Distributions
83
maximize the probability that the observed data are in fact observed.
Clearly, the maximum of
P(dobs)
occurs when the argument of the
exponential is a maximum, or when the quantity given by
(d
-
Gm)=[cov
d]-'(d
-
Gm)
(5.6)
is a minimum. But this expression is just a weighted measure of
prediction length. The maximum likelihood estimate of the model
parameters is nothing but the weighted least squares solution, where
the weighting matrix is the inverse of the covariance matrix of the data
(in the notation of Chapter
3,
W,
=
[cov
d]-').
If the data happen to be
uncorrelated and all have equal variance, then [cov
d]
=
031,
and the
maximum likelihood solution is the simple least squares solution. If
the data are uncorrelated but their variances are all different (say,
a:,),
then the prediction error is given by
N
E
=
aZ2ef
i-
1
(5.7)
where
ei
=
(dpbs
-
dp'")
is the prediction error for each datum. Each
measurement is weighted by the reciprocal of its variance; the most
certain data are weighted most.
We have justified the use of the
L,
norm through the application
of
probability theory. The least squares procedure for minimizing the
L,
norm of the prediction error makes sense if the data are uncorrelated,
have equal variance, and obey Gaussian statistics. If the data are not
Gaussian, then other measures of prediction error may
be
more
appropriate.
5.3
A
Priori Distributions
If
the linear problem is underdetermined, then the least squares
inverse does not exist. From the standpoint of probability theory, the
distribution of the data
P(dobs)
has
no
well-defined maximum with
respect to variations of the model parameters. At best, it has a ridge of
maximum probability (Fig.
5.4).
To solve this underdetermined problem we must add a priori
information that causes the distribution to have a well-defined peak.
One way to accomplish this is to write the a priori information about
the model parameters as a probability distribution P,(m), where the