632 16 Regression
STEP 2. Assume that the model already has k variables, x
1
,..., x
k
, for some k ≥ 1.
Select the variable x
k+1
that gives the maximal increase to R
2
and refit the model.
STEP 3. Denote by SSR(x
1
,..., x
k
) the regression sum of squares for a regression fit-
ted with variables x
1
,..., x
k
. Then R(x
k+1
|x
1
,..., x
k
) = SSR(x
1
,..., x
k+1
)−SSR(x
1
,..., x
k
)
is the contribution of the (k
+1)st variable and it is considered significant if
R(x
k+1
|x
1
,..., x
k
)/MSE > F
1,n−k−1,α
. (16.5)
If relation (16.5) is satisfied, then variable x
k+1
is included in the model. Increase k
by one and go to STEP 2.
If relation (16.5) is not satisfied, then the contribution of x
k+1
is not significant, in
which case go to STEP 4.
STEP 4. Stop with the model that has k variables. END
The MSE in (16.5) was estimated from the full model. Note that the for-
ward selection algorithm is “greedy” and chooses the single best improving
variable at each step. This, of course, may not lead to the optimal model since
in reality variable x
1
, which is the best for one-variable models, may not be
included in the best two-variable model.
Backward stepwise regression starts with the full model and removes vari-
ables with insignificant contributions to R
2
. Seldom do these two approaches
end with the same candidate model.
MATLAB’s Statistics Toolbox has two functions for stepwise regression:
stepwisefit, a function that proceeds automatically from a specified initial
model and entrance/exit tolerances, and
stepwise, an interactive tool that al-
lows you to explore individual steps in a process.
An additional criterion for the goodness of a model is the Mallows C
p
. This
criterion evaluates a proposed model with k variables and p
= k +1 parame-
ters. The Mallows C
p
is calculated as
C
p
=(n − p)
s
2
ˆ
σ
2
−n +2p,
where s
2
is the MSE of the candidate model and
ˆ
σ
2
is an estimator of σ
2
,
usually taken to be the best available estimate. The MSE of the full model is
typically used as
ˆ
σ
2
.
A common misinterpretation is that in C
p
, p is referred to as the num-
ber of predictors instead of parameters. This is correct only for models with-
out the intercept (or when 1 from the vector of ones in the design matrix is
declared as a predictor).
Adequate models should have a small C
p
that is close to p. Typically, a
plot of C
p
against p for all models is made. The “southwesternmost” points
close to the line C
p
= p correspond to adequate models. The C
p
criterion is
also employed in forward and backward variable selection as a stopping rule.
Bayesian Multiple Regression. Next we revisit
fat.dat with some
Bayesian analyses. We selected four competing models and compared them