11.3 VARIABLE SELECTION PROCEDURES
Health sciences researchers contemplating the use of multiple regression analysis to solve
problems usually find that they have a large number of variables from which to select
the independent variables to be employed as predictors of the dependent variable. Such
investigators will want to include in their model as many variables as possible in order
to maximize the model’s predictive ability. The investigator must realize, however, that
adding another independent variable to a set of independent variables always increases
the coefficient of determination Therefore, independent variables should not be added
to the model indiscriminately, but only for good reason. In most situations, for example,
some potential predictor variables are more expensive than others in terms of data-
collection costs. The cost-conscious investigator, therefore, will not want to include an
expensive variable in a model unless there is evidence that it makes a worthwhile
contribution to the predictive ability of the model.
The investigator who wishes to use multiple regression analysis most effectively
must be able to employ some strategy for making intelligent selections from among
those potential predictor variables that are available. Many such strategies are in cur-
rent use, and each has its proponents. The strategies vary in terms of complexity and
the tedium involved in their employment. Unfortunately, the strategies do not always
lead to the same solution when applied to the same problem.
Stepwise Regression Perhaps the most widely used strategy for selecting inde-
pendent variables for a multiple regression model is the stepwise procedure. The proce-
dure consists of a series of steps. At each step of the procedure each variable then in the
model is evaluated to see if, according to specified criteria, it should remain in the model.
Suppose, for example, that we wish to perform stepwise regression for a model
containing k predictor variables. The criterion measure is computed for each variable. Of
all the variables that do not satisfy the criterion for inclusion in the model, the one that
least satisfies the criterion is removed from the model. If a variable is removed in this
step, the regression equation for the smaller model is calculated and the criterion meas-
ure is computed for each variable now in the model. If any of these variables fail to sat-
isfy the criterion for inclusion in the model, the one that least satisfies the criterion is
removed. If a variable is removed at this step, the variable that was removed in the first
step is reentered into the model, and the evaluation procedure is continued. This process
continues until no more variables can be entered or removed.
The nature of the stepwise procedure is such that, although a variable may be
deleted from the model in one step, it is evaluated for possible reentry into the model in
subsequent steps.
MINITAB’s STEPWISE procedure, for example, uses the associated F statistic as
the evaluative criterion for deciding whether a variable should be deleted or added to
the model. Unless otherwise specified, the cutoff value is The printout of the
STEPWISE results contains t statistics (the square root of F ) rather than F statistics.
At each step MINITAB calculates an F statistic for each variable then in the model. If
the F statistic for any of these variables is less than the specified cutoff value (4 if some
other value is not specified), the variable with the smallest F is removed from the model.
The regression equation is refitted for the reduced model, the results are printed, and
F = 4.
R
2
.
556 CHAPTER 11 REGRESSION ANALYSIS: SOME ADDITIONAL TECHNIQUES