inverse. Consequently, I refer to R
red
in (6.19) as the pseudoinverse of R
red
. The stan-
dardized principal components coefficients are therefore
b
s
pc
R
red
r
xy
冤冥冤冥
冤冥
.
Although these are not quite as close to the true values of .25 and .15 as are the ridge
estimates, they represent a vast improvement over the OLS estimates.
In general, then, the principal components estimator of
ββ
s
is of the form b
s
pc
R
red
r
xy
, where
R
red
冱
J
i1
λ
1
i
u
i
u
i
and J K. Some authors recommend using the percentage of variance accounted for
by the J retained components as a guide to how many components to omit (e.g., Hadi
and Ling, 1998). However, typically, dropping the last component, which is associ-
ated with the smallest eigenvalue, will be sufficient. As with ridge regression, the
bias of the principal components estimator is easy to see, since
E(b
s
pc
) E(R
_
red
r
xy
) R
_
red
1
n
Z E(y
z
) R
_
red
1
n
ZZ
ββ
s
R
_
red
R
xx
ββ
s
ββ
s
.
Body Fat and NSFH Data, Revisited. The last column in panel A of Tables 6.7 and
6.8 presents the unstandardized principal components estimates for the body fat and
NSFH data, respectively. Notice that the principal components estimates are quite
close to the ridge regression estimates, and both are substantially different from the
OLS estimates for the body fat data. For the NSFH data, all of the estimates, whether
OLS, ridge, or principal components, are fairly similar. Perhaps the key substantive
difference between OLS and the other estimators are that the latter have more intu-
itive signs for the effect of thigh circumference in the body fat data and for the effect
of female age in the NSFH data. Again, the primary limitation with the latter esti-
mators is that inferences to the population parameters cannot be made. On the other
hand, the ridge and principal components estimators are probably closer than the
OLS coefficients to the true values of the parameters. It should be mentioned that not
all software makes these two techniques available to the analyst. SAS offers both
estimators as options to the OLS regression procedure, invoked using the keywords
RIDGE (for ridge regression) and PCOMIT (for principal components regression).
Concluding Comments. Although I confine my discussion of influential observa-
tions as well as collinearity problems and remedies to this chapter, these issues apply
to all generalized linear models. Influence diagnostics have been devised for tech-
niques such as logistic regression (Pregibon, 1981) and are included in such software
packages as SAS. Collinearity problems can plague any model that employs multiple
regressors; however, not all procedures offer collinearity diagnostics. On the other
hand, multicollinearity is strictly a problem in the design matrix and does not depend
on the nature of the link to the response. Therefore, it can always be diagnosed with
.2
.2
.40925
.38875
.2506 .2506
.2506 .2506
REGRESSION DIAGNOSTICS II: MULTICOLLINEARITY 241