630 16 Regression
16.4.2.2 Multicollinearity
The multicollinearity problem in regression concerns the correlation among
the predictors. Suppose that matrix X contains two collinear columns x
i
and
x
j
= 2x
i
. Obviously covariates x
i
and x
j
are linearly dependent and x
j
does
not bring any new information about the response. This collinearity makes
matrix X not of full rank and X
0
0
0
X singular, that is, not invertible, and normal
equations (16.3) have no solution. In reality, if multicollinearity is present,
then matrix X
0
0
0
X is not singular, but near-singular, in the sense that its de-
terminant is close to 0, making the inversion of X
0
0
0
X, and consequently the
solution
ˆ
β, very unstable. This happens when either two or more variables are
highly correlated or when a variable has a small variance (and, in a sense, is
correlated to the intercept).
There are several indices of multicollinearity. We will discuss the con-
dition index, which is a global measure, and the local condition index and
variance inflation factor, which are linked to a particular variable.
Let
λ
(1)
≤ λ
(2)
≤ ··· ≤ λ
(p)
be ordered eigenvalues of X
0
0
0
X. The condition
number is defined as the ratio of the largest and smallest eigenvalues:
K
=
s
λ
(n)
λ
(1)
.
Concerning values for K starting at around 10, values between 30 and 100
influence the results, and values over 100 indicate a serious collinearity prob-
lem.
The (local) condition index for variable x
i
is
K
i
=
s
λ
(n)
λ
i
.
Since eigenvalues explain the budget of variances among the variables, a large
condition index means the variance in variable i is relatively small, which is a
source of multicollinearity. Variables with indices that exceed 30 are problem-
atic.
The variance inflation factor (VIF) explains the extent of correlation of a
particular variable x
i
to the rest of predictors. It is defined as
VIF
i
=
1
1 −R
2
i
,
where R
2
i
is the coefficient of determination in regression of x
i
to the rest
of predictors. VIFs exceeding 10 are considered serious. Computationally, one
finds the correlation matrix for the predictors. The diagonal elements of this
inverse are the VIFs. Unfortunately, a VIF diagnostic sometimes can miss a
problem since the intercept is not included in the analysis.