Two main steps are involved when applying the normal equations (Equation 2.46):
(1) construction of A, and
(2) solution of c ¼ðA
T
AÞ
1
A
T
y in MATLAB as c = (A*A)\A*y.
2.9.2 Coefficient of determination and quality of fit
The mean value
y can be used to represent a set of data points y
i
,andisauseful
statistic if the scatter can be explained by the stochastic nature of a process or
errors in measurement. If the data show a trend with respect to a change in a
property or condition (i.e. the independent variable(s)), then the mean value will
not capture the nature of this trend. A model fitted to the data is expected to
capture the observed trend either wholly or partially. The outcome of the fit will
depend on many factors, such as the type of model chosen and the magnitude of
error in the data. A “best-fit” model is usually a superior way to approximate data
than simply stat ing the mean of all y
i
points, since the model contains at least one
adjustable coefficient and a term that represents the nature of dependency of y on
the independent variable(s).
The difference (y
i
y) (also called deviation) measures the extent of deviation of
each data point from the mean. The sum of the squared difference
P
ðy
i
yÞ
2
is a
useful quantitative measure of the spread of the data about the mean. When a
fitted function is used to approximate the data, the sum of the squared residuals is
krk
2
¼
P
ðy
i
^
yÞ
2
. The coefficient of determination R
2
is popularly used to deter-
mine the quality of the fit; R
2
conveys the improvement attained in using the model
to describe the data, compared to using a horizontal line that passes throu gh the
mean. Calculation of R
2
involves comparing the deviations of the y data points from
the model prediction, i.e.
P
ðy
i
^
yÞ
2
, with the deviations of the y data points from
the mean, i.e.
P
ðy
i
yÞ
2
. Note that R
2
is a number between zero and one and is
calculated a s shown in Equation (2.47):
R
2
¼ 1
P
ðy
i
^
y
i
Þ
2
P
ðy
i
yÞ
2
¼ 1
krk
2
P
ðy
i
yÞ
2
: (2:47)
The summation subscripts have been omitted to improve readability.
If
P
ðy
i
^
yÞ
2
is much less than
P
ðy
i
yÞ
2
, then the model is a better approximation
to the data compared to the mean of the data. In that case, R
2
will be close to unity. If the
model poorly approximates the data, the deviations of the data points from the model
will be comparable to the data variation about the mean and R
2
will be close to zero.
Example 2.7 (continued)
We readdress the problem of fitting the data set to a straight line, but this time Equation (2.46) is used
instead of Equation (2.36). For this system of equations
A ¼
11:0
12:0
13:0
14:0
15:0
16:0
2
6
6
6
6
6
4
3
7
7
7
7
7
5
and y ¼
1:6
4:2
6:6
8:7
11:5
13:7
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
:
115
2.9 Curve fitting using linear least-squares method