Alfred DeMaris - Regression with Social Data, Modeling Continuous and Limited Response Variables

Подождите немного. Документ загружается.

are, instead, linearly dependent, the least-squares estimates are undeﬁned. Unless one

predictor is a perfect linear combination of other predictors, linear dependence is rare.

However, near-perfect linear dependence, a not-so-rare phenomenon, is the problem

known as multicollinearity.

Rank. The rank of a matrix A, denoted r(A), is the number of linearly independent

columns in the matrix. (The number of linearly independent columns of a matrix is

the same as the number of linearly independent rows.) The rank of a matrix has a

direct bearing on whether the inverse of a matrix exists: If the rank of an n  n matrix

equals n, the matrix has an inverse. Also: A has an inverse if and only if 冟A冟 is not zero.

This principle is apparent in the deﬁnition of the inverse for a 2  2 matrix above,

since if 冟A冟 is zero, the inverse is undeﬁned. Hence, a matrix A has an inverse only if

its columns are linearly independent, in which case its determinant is nonzero.

Example. The 2  2 matrix A

冤冥

has only one linearly independent column,

since the second column is twice the ﬁrst. We would therefore expect that the inverse

of this matrix doesn’t exist. This is clear from the fact that 冟A冟 (1)(4)  (2)(2)  0;

hence, the inverse is undeﬁned.

H. Eigenvalues and Eigenvectors

Eigenvalues and eigenvectors are very important tools in the diagnosis of and rem-

edy for multicollinearity. Therefore, I introduce them brieﬂy here.

Definitions. Given a square matrix A

, we ask whether there is a scalar λ and a vec-

tor u such that Au  λu. (In other words, the scalar acts like the matrix in products

with the vector.) If so, then λ is called an eigenvalue of A and u is called an eigen-

vector of A. It turns out that, for any A

, there are n such λ’s and at least n such u’s.

It should, however, be noted that the eigenvalues of A are roots of a polynomial

equation of degree n and are not necessarily real numbers unless A is symmetric. The

eigenvalues of a symmetric matrix are all real.

Example.

冤冥

has two eigenvalues, 5 and 7, with associated eigenvectors

u

 [2 3],

u

 [2 3].

The reader can easily verify that

冤冥冤冥

5

冤冥

冤冥冤冥

 7

冤冥

3

MATHEMATICS TUTORIALS 485

bapp01.qxd 30.8.04 11:46 Page 485

Two principles connect eigenvalues to other matrix properties:

(1) The trace of a matrix is equal to the sum of its eigenvalues.

(2) The determinant of a matrix is equal to the product of its eigenvalues.

Rule 2 helps to convey somewhat of an intuitive feeling for eigenvalues. They are

associated with the degree of linear dependence in a matrix, since if the determinant

is zero, meaning that one or more columns of a matrix is a linear combination of the

other columns, one or more of the eigenvalues must therefore also be zero.

Spectral Decomposition of a Symmetric Matrix. A special application of eigenval-

ues and eigenvectors is the spectral decomposition of a symmetric matrix. It shows

how a symmetric matrix (a correlation matrix, for example) can be shown to be a

weighted sum of its eigenvalues times other matrices. These other matrices consist

of the outer products of the matrix’s eigenvectors with themselves. This formulation

is especially important for understanding the remedy for multicollinearity known as

principal components regression.

Suppose that we have a symmetric matrix, A. Suppose, further, that λ

and u

are the

jth eigenvalue and associated eigenvector, respectively, of A. If A is n  n, there are n

eigenvalues and eigenvectors associated with this matrix. If A is symmetric, it turns out

that its eigenvectors are all pairwise orthogonal. Without any loss of generality, we can

normalize the eigenvectors so that each is also normal. The jth eigenvector, u

, is nor-

malized by multiplying it by 1/兹u

苶



苶

. The resulting normalized eigenvectors will be

both normal and orthogonal. Hence, if we collect them in a matrix U, it will be an

orthogonal matrix with the property that UUI. We then note that

A  AI  AUU.

Now suppose, for argument’s sake, that A is 2  2, so that U is also 2  2. Then

UU[u

]

冤冥

 u

u

 u

u



冱

u

This illustrates that, in general,

A  AUUA

冱

u



冱

u



冱

u

(Recall that by the deﬁnition of eigenvalues and eigenvectors, Au

 λ

.) The last sum

on the right is called the spectral decomposition of a symmetric matrix. It shows that any

symmetric matrix A can be shown to be a sum of matrices, each of which is the product

of an eigenvalue of A times the outer product of its associated eigenvector with itself.

I. Expectation and Variance of Vectors

The expectation of a random variable, X, is denoted by E(X). It is the average, or

mean, value of X in the population and is usually given the symbol µ. The variance

u

486 MATHEMATICS TUTORIALS

bapp01.qxd 30.8.04 11:46 Page 486

of a random variable, V(X), is the average squared deviation of X from its mean. That

is, V(X)  E(X  µ)

. We can deﬁne similar properties for vectors of random variables.

Suppose that x is a vector of, say, three random variables: x 

冤冥

. Notice now

that the elements of this vector are not speciﬁc values. Rather, they are variables.

That is, instead of talking about these variables individually, we have collected them

in a vector, x. The expected value of the vector x, denoted E(x), is simply the vector

of expected values of the individual variables. That is,

E(x) 

冤冥



冤冥



µµ

Notice that

µµ

is the symbol for the vector of variable means.

The variance of the vector x is deﬁned as V(x)  E[(x 

µµ

)(x 

µµ

)]. Notice here

that (x 

µµ

) is a column vector of the variables deviated from their means. Its trans-

pose (x 

µµ

) is therefore a row vector of the variables deviated from their means.

Thus, (x 

µµ

)(x 

µµ

) is an outer product of two vectors, or a matrix. In other words,

the variance of a vector is a matrix of expected values. For our three-variable vector,

x, here’s what (x 

µµ

)(x 

µµ

) looks like:

(x 

µµ

)(x 

µµ

)

冤冥

 µ

]



冤冥

Hence,

V(x)  E[(x 

µµ

)(x 

µµ

)]



冤冥

 V.

Since E(X

 µ

)

is the variance of X

, and E(X

 µ

)(X

 µ

) is the covariance of X

with X

, V is referred to as the variance–covariance matrix for x.

We have seen above that the product of, say, a 3  3 matrix A with a 3  1 vector

x has the form

Ax 

冤冥

x 

冤冥

x



E(X

 µ

)(X

 µ

)

E(X

 µ

)(X

 µ

)

E(X

 µ

)

E(X

 µ

)(X

 µ

)

E(X

 µ

)

E(X

 µ

)(X

 µ

)

E(X

 µ

)

E(X

 µ

)(X

 µ

)

E(X

 µ

)(X

 µ

)

 µ

)(X

 µ

)

 µ

)(X

 µ

)

 µ

)

 µ

)(X

 µ

)

 µ

)

 µ

)(X

 µ

)

 µ

)

 µ

)(X

 µ

)

 µ

)(X

 µ

)

 µ

E(X

)

E(X

)

E(X

)

MATHEMATICS TUTORIALS 487

bapp01.qxd 30.8.04 11:46 Page 487

That is, each element of the resulting matrix is a linear combination of the elements

of the vector x. In general, if A is a matrix of constants and x is a vector of random

variables with E(x) 

µµ

and V(x)  V, then y  Ax is called a linear transformation

of the vector x, and

E(y)  E(Ax)  AE(x)  A

µµ

V(y)  AVA .

J. Applications

In this section we examine some common statistical applications of matrix algebra.

Application 1. Expressing a sample variance–covariance matrix of K variables in

matrix notation. Suppose that we have a sample of measurements on K variables, X

,..., X

, for n persons. Let’s construct a matrix expression for the variance–

covariance matrix of these X’s, in matrix algebra. First, for the sake of simplicity, let’s

assume that K  3, so that our variables are X

, and X

. Further, let’s assume that

n  5. Then, our data matrix can be expressed as

X 

冋册

Each row consists of the variable scores for one observation, and each column rep-

resents all of the scores for one variable. In general, such a data matrix, X, can be

constructed for any value of K or n. Now consider a given column of X, say the ﬁrst.

Denote this as x

. We saw above that if C is the centering matrix, Cx

 x

 x

苶

the vector of x

scores deviated from their means. Now, the inner product of this vec-

tor with itself is the sum of squares of the elements of the vector. That is,

(Cx

)(Cx

)  (x

 x

苶

)(x

 x

苶

) 

冱

i1

 x

苶

)

(The reader can verify that this would be true, using the ﬁrst column of the matrix X

above.) Now (Cx

)(Cx

)  x

CCx

[by rule V.E.(10)]  x

, since C is both sym-

metric and idempotent. At this point, let’s pause and show that C is symmetric and

idempotent. Recall that C  (I  J

苶

)  I  (1/n)11. Thus, C[I  (1/n)11]

I11(1/n)  I  (1/n)11C [by rules V.E (10), V.E (11), and the fact that I is

symmetric, as the reader can easily verify]. Hence, C is symmetric. Now

 (I  J

苶

)(I  J

苶

)  I

 IJ

苶

 J

苶

I  J

苶

 I  J

苶

 J

苶

 J

苶

 I  J

苶

 C [since I, as noted

previously, is idempotent, and J

苶

 (1/n)11(1/n)11(1/n

)1111(1/n

)1n1

(1/n)11J

苶

]. Therefore, C is also idempotent.

Since x



冱

i1

 x

苶

)

, we have that



n 



x





冱

(



苶

)



 s

488 MATHEMATICS TUTORIALS

bapp01.qxd 30.8.04 11:46 Page 488

the sample variance of X

. Now what is [1/(n  1)]x

? Recall that Cx

 x

 x

苶

Similarly, Cx

 x

 x

苶

. Hence,

 x

苶

)(x

 x

苶

)  (Cx

)Cx

 x

CCx

 x



冱

i1

 x

苶

)(x

 x

苶

That is, x

Cx

is the corrected (by the mean) sum of cross-products of X

with X

Therefore,



n 



x





n 



冱

i1

 x

苶

)(x

 x

苶

)  cov(X

With these results established, we now consider the matrix product [1/(n  1)]XCX

for the X matrix above. Partitioning X by its columns gives us X  [x

]. Hence

X CX is

冤冥

C[x

Treating C as a constant (which is legitimate as long as conformability for multipli-

cation is satisﬁed), this equals

冤冥

Thus,



n 



XCX 

冤冥

which is the sample variance–covariance matrix of the variables in X. Notice that the

sample variances of X

, and X

are on the diagonal, while the covariances are

everywhere else.

Application 2. Multiple regression in matrix algebra. Let’s examine the matrix for-

mulation for the multiple regression model. First, we consider the multiple regres-

sion equation for the ith observation, with, say, two regressors in the model—again,

for simplicity. The model is Y

 β

 β

 ε

. To estimate the model using

ordinary least squares it is assumed, among other things, that the ε

are independent

and identically distributed random variables with a mean of zero and a variance of

for all i. Let’s again suppose that n  5, to make things manageable. With these



n 



x



n 



x



n 



n 



x



n 



x



n 



n 



x



n 



x



n 



x

MATHEMATICS TUTORIALS 489

bapp01.qxd 30.8.04 11:46 Page 489

speciﬁcations, the data can easily be represented as follows. The set of ﬁve Y scores

can be represented by the vector

y 

冤冥

The scores on the independent variables for the ﬁve cases can be represented with

the regressor matrix



冤冥

The error terms can be represented by the vector

εε



冤冥

Finally, the vector of parameters can be represented by the vector

ββ



冤冥

Now we make the following modiﬁcations in notation. First, we drop the second

subscript on the terms in y and

εε

, for simplicity. The remaining subscript simply

indexes the ith case, where i  1, ..., 5. We have

y 

冤冥

εε



冤冥

Next, we add a column of ones to the matrix X to accommodate the equation inter-

cept (as the reader will see shortly). The resulting matrix, called the design matrix, is

X 

冤冥

490 MATHEMATICS TUTORIALS

bapp01.qxd 30.8.04 11:46 Page 490

The regression equation for the ﬁrst observation is y

 β

(1)  β

 β

 ε

(Notice how “1” is the “variable” whose coeﬃcient is β

here.) Another way to write

this is y

 x



ββ

 ε

, where x

 is the ﬁrst row of the X matrix and

ββ

is the parame-

ter vector. (The reader can verify that this is the equation for y

by performing the

operations on the right-hand side of this equation, using the ﬁrst row of X and the

parameter vector, plus the error term.) In general, we write y

 x



ββ

 ε

as the model

for the ith observation on Y. To write all ﬁve equations at once, we employ matrix

notation: y  X

ββ



εε

. In general, y  X

ββ



εε

is the matrix expression for the regres-

sion of the observed y values on the predictor set, regardless of sample size. To under-

stand why this expression works, we partition X according to its rows. Then we have

y 

冤冥



冤冥

ββ



冤冥

Notice that in this formulation, I am depicting

ββ

as though it is a constant, since to

show it as a column vector makes it appear as though it is not conformable for mul-

tiplication with X. However, its conformability for multiplication with X is clear,

since X is 5  3 and

ββ

is 3  1. The result, however, is a 5  1 vector, of which the

ith element is x



ββ

Recall, from above, the assumption that the ε

are independent and identically dis-

tributed random variables with a mean of zero and a variance of σ

for all i. These

assumptions can also be expressed in matrix form. They are E(

εε

)  0 and V(

εε

)  σ

That is, if the expected value of each error is zero, the expected value of the vector

ε is a vector of zeros. Hence, E(y)  E(X

ββ



εε

)  E(X

ββ

)  E(

εε

)  X

ββ

 0  X

ββ

. The

term σ

I needs some further explanation. Remember that the variance of a vector of

variables (and the ε’s are, in fact, theoretical random variables) is a variance–covari-

ance matrix for the variables. What this second assumption is saying is that the vari-

ance–covariance matrix for the error terms is of the form σ

I, where I is n  n. That

is, in our simple example, the variance–covariance matrix for the ﬁve error terms is

of the form

σσ

冤冥



冤冥

 V(

εε

As we can see, this means that the variances of the error terms are a constant value

of σ

for each observation, and the covariances among the error terms are all zero.

The vector of parameter estimates is

b 

冤冥



MATHEMATICS TUTORIALS 491

bapp01.qxd 30.8.04 11:46 Page 491

These are found by minimizing the sum of squared residuals with respect to the

parameter values. The least-squares solution vector, b, is found by solving the nor-

mal equations, which in matrix form are

XXb  Xy.

The least-squares solution vector is therefore

b  (XX)

1

Xy.

Now we come full circle and answer the original question posed at the beginning

of this particular tutorial (see Section V.A): How do we ﬁnd the expected value and

variance of a coeﬃcient estimate in the linear regression model? In fact, let’s ﬁnd the

expected value and variance of the entire vector of linear regression estimates. First,

we will assume that the X-values are ﬁxed over repeated sampling. This ﬁxed-X

assumption is a standard assumption in linear regression, although it is routinely

violated. Nevertheless, the results we present hold asymptotically regardless of the

nature of the X’s (see, e.g., Greene, 2003). Moreover, we assume that we have a sam-

ple of n observations and p  K  1 regressors, including the equation intercept, so

that y and

εε

have dimension n  1, X has dimension n  p,and

ββ

has dimension p  1.

If X is ﬁxed, the p  n matrix (XX)

1

X is a matrix of constants. Call this matrix

A. Recall, in general, that if A is a matrix of constants, then y  Ax is called a lin-

ear transformation of the vector x, and E(y)  AE(x), V(y)  AVA . Now let b be y

here, and let y be x. Then b  Ay and we have that E(b)  AE(y) 

(XX)

1

XE(y)  (XX)

1

X X

ββ

 I

ββ



ββ

. This shows that the vector of estimates,

b, is unbiased for the parameter vector,

ββ

. Now what about V(b)? First, we need to

observe that if y  X

ββ



εε

, then V(y)  V(X

ββ



εε

)  V(

εε

)  σ

I  V. (The term X

ββ

has no variance over repeated sampling, since X is ﬁxed and

ββ

is also a collection of

constants.) We then have

V(b)  AVA  (XX)

1

Xσ

IX(XX)

1

 σ

(XX)

1

XX(XX)

1

 σ

(XX)

1

I  σ

(XX)

1

Substituting the estimate of σ

[which is SSE/(n  K  1)] into this last expression

gives us an estimate of the variance–covariance matrix of the regression parameter

estimates.

Application 3. Using matrix calculations to ﬁnd the estimates of b

and b

in a sim-

ple linear regression. Just for practice, let’s use the matrix expression for b,

b  (XX)

1

Xy, to calculate b for a simple linear regression model of four observa-

tions. It is then left as an exercise for the reader to verify that the same estimates are

obtained using the traditional SLR formulas for the intercept and slope (see Chapter

2). The four X-values are 2, 3.3, 3.9, and 7. The four Y-values are, respectively, 5, 2,

492 MATHEMATICS TUTORIALS

bapp01.qxd 30.8.04 11:46 Page 492

3, and 9. The relevant vectors and matrices are, therefore,

y

冤冥

X 

冤冥

To ﬁnd the least-squares solution vector, we calculate

XX 

冤冥



冤冥

(as the reader can verify). To ﬁnd the inverse of this 2  2 matrix, we ﬁrst ﬁnd the

determinant, which is (4)(79.1)  (16.2)(16.2)  53.96. The inverse matrix is then



.96



冤冥



冤冥

(as the reader can verify). In a similar vein, we have

Xy 

冤冥



冤冥

(as, again, the reader should verify). Finally, we have

b  (XX)

1

Xy 

冤冥冤冥



冤冥

The reader should also verify this last calculation. The sample regression equation

is, therefore,

Y  .4418  1.0638X  e.

K. Exercises

(1) Evaluate xy for x 

冤冥

y 

冤冥







1.4418

1.0638

91.3

3.002

3.0741

1.4659

3.002

91.3

3.9

3.3

.3002

.0741

1.4659

.3002

16.2

79.1

16.2

16.2

79.1

16.2

3.3

3.9

3.3

3.9

MATHEMATICS TUTORIALS 493

bapp01.qxd 30.8.04 11:46 Page 493

(2) Evaluate xz for x 

冤冥

and z 

冤冥

(3) Evaluate

冤

册冋

冥

(4) Evaluate

冤冥冤冥

(5) Evaluate

冤冥冤冥

(6) Evaluate xy for x 

冤冥

and y 

冤冥

(7) Evaluate

冤冥冤冥

(8) Evaluate

冤冥冤冥

(9) Find A

1

if A 

冤冥

and verify that A

1

A  AA

1

 I.

(10) Find A

1

if A 

冤冥

and verify that A

1

A  AA

1

 I.

(11) Verify that the inverse of A 

冤冥

is 



B, where B 

冤冥

(12) Verify that the inverse of P 



冤冥

is P. Matrices with

this property are orthogonal, that is, if u

is the jth column of the matrix, then

u

 1 for all j, and u

 0 whenever ij. Verify that the columns of P

have these properties.

(13) Solve the following systems Ax  y by ﬁnding x  A

1

System a: 2x

 3x

 5; 4x

 x

 9.

System b: 3x

 5x

 0; 2x

 4x

 7.

10

11

14

5

10

3

11

3









2

1

.5

2

1.5

3

494 MATHEMATICS TUTORIALS

bapp01.qxd 30.8.04 11:46 Page 494