16.2 Simple Linear Regression 601
Here β
0
and β
1
are the population intercept and slope parameters, respec-
tively, and
²
i
is the error. We assume that the errors are not correlated and
have mean 0 and variance
σ
2
, thus E y
i
= β
0
+β
1
x
i
and Var y
i
= σ
2
. The goal
is to estimate this linear model, that is, estimate
β
0
, β
1
, and σ
2
from the
n observed pairs. To put our discussion in context, we consider an example
concerning a study of factors affecting patterns of insulin-dependent diabetes
mellitus in children.
Example 16.1. Diabetes Mellitus in Children. Diabetes mellitus is a con-
dition characterized by hyperglycemia resulting from the body’s inability to
use blood glucose for energy. In type 1 diabetes, the pancreas no longer makes
insulin and therefore blood glucose cannot enter the cells to be used for energy.
The objective was to investigate the dependence of the level of serum C-
peptide on various other factors in order to understand the patterns of resid-
ual insulin secretion. C-peptide is a protein produced by the beta cells of the
pancreas whenever insulin is made. Thus, the level of C-peptide in the blood
is an index of insulin production.
The part of the data from Sockett et al. (1987), discussed in the context
of statistical modeling by Hastie and Tibshirani (1990), is given next. The
response measurement is the logarithm of C-peptide concentration (pmol/ml)
at the time of diagnosis, and the predictor is the base deficit, a measure of
acidity.
Deficit (x) −8.1 −16.1 −0.9 −7.8 −29.0 −19.2 −18.9 −10.6 −2.8 −25.0 −3.1
Log C-peptide (y)
4.8 4.1 5.2 5.5 5 3.4 3.4 4.9 5.6 3.7 3.9
Deficit (x) −7.8 −13.9 −4.5 −11.6 −2.1 −2.0 −9.0 −11.2 −0.2 −6.1 −1
Log C-peptide (y)
4.5 4.8 4.9 3.0 4.6 4.8 5.5 4.5 5.3 4.7 6.6
Deficit (x) −3.6 −8.2 −0.5 −2.0 −1.6 −11.9 −0.7 −1.2 −14.3 −0.8 −16.8
Log C-peptide (y)
5.1 3.9 5.7 5.1 5.2 3.7 4.9 4.8 4.4 5.2 5.1
Deficit (x) −5.1 −9.5 −17.0 −3.3 −0.7 −3.3 −13.6 −1.9 −10.0 −13.5
Log C-peptide (y)
4.6 3.9 5.1 5.1 6.0 4.9 4.1 4.6 4.9 5.1
We will follow this example in MATLAB as an annotated step-by-step
code/output of
cpeptide.m. For more sophisticated analysis, MATLAB has
quite advanced built-in regression tools,
regress, regstats, robustfit, stepwise,
and many other more or less specialized fitting and diagnostic tools.
After importing the data, we specify
p, which is the number of parameters,
rename the variables, and find the sample size.
Deficit =[-8.1 -16.1 -0.9 -7.8 -29.0 -19.2 -18.9 -10.6 -2.8...
-25.0 -3.1 -7.8 -13.9 -4.5 -11.6 -2.1 -2.0 -9.0 -11.2 -0.2...
-6.1 -1 -3.6 -8.2 -0.5 -2.0 -1.6 -11.9 -0.7 -1.2 -14.3 -0.8...
-16.8 -5.1 -9.5 -17.0 -3.3 -0.7 -3.3 -13.6 -1.9 -10.0 -13.5];
logCpeptide =[ 4.8 4.1 5.2 5.5 5 3.4 3.4 4.9 5.6 3.7 3.9 ...
4.5 4.8 4.9 3.0 4.6 4.8 5.5 4.5 5.3 4.7 6.6 5.1 3.9 ...
5.7 5.1 5.2 3.7 4.9 4.8 4.4 5.2 5.1 4.6 3.9 5.1 5.1 ...
6.0 4.9 4.1 4.6 4.9 5.1];