Alfred DeMaris - Regression with Social Data, Modeling Continuous and Limited Response Variables

Подождите немного. Документ загружается.

(d) yˆ  4  3x  1.25x

5.16 Suppose that the model for E(Y ) is E(Y )  25  .25x  .15x

 3z, where X and

Z are centered and s

 1.75. Give the value of E(Y) at the mean value of X and

Z, at the mean of X but Z at z

苶

 1 s

, and at the mean of X but Z at z

苶

 1 s

5.17 Suppose that the model for E(Y) is E(Y) 4  3x  .25x

 1.5z  .15xz,

where X and Z are centered and s

 1.75. Give the value of ∂[E(Y)]/∂X at z

苶

 1 s

, and z

苶

 1 s

5.18 Suppose that the model for E(Y) is E(Y)  5  1.75x  3z  .35xz, where X

and Z are centered and s

 1.75. Give the value of ∂[E(Y)]/∂X at z

苶

, z

苶

 1 s

and z

苶

 1 s

5.19 Suppose that the model for E(Y) is E(Y)  15  3x  .25x

 2z  .15xz 

.25x

z, where X and Z are centered and s

 1.75. Give the value of ∂[E(Y )]/

∂X at z

苶

, z

苶

 1 s

, and z

苶

 1 s

5.20 Suppose that the model for E(Y ) is E(Y ) 5.2  5x  3z .45x



.25z

 .13xz  .07xz

 .15x

z  .09x

, where X and Z are centered and

 1.75. Give the value of ∂[E(Y)]/∂X at z

苶

, z

苶

 1 s

, and z

苶

 1 s

In Exercises 5.21 to 5.25, identify whether the equation for E(Y) is charac-

terized by (a) a linear vs. nonlinear model, (b) a linear vs. nonlinear eﬀect of

X or Z (in the absence of any interaction eﬀects only), and (c) a linear vs.

nonlinear interaction in each of X and Z.

5.21 E(Y)  α  β log x  γ z  δz log x.

5.22 E(Y)  α  βx  γx

 δx

 φz

1/2

λxz

1/2

5.23 E(Y)  log(α  βx  γz  λxz).

5.24 E(Y)  α  β

 γ

 λ

5.25 E(Y)  α  x

 γz  λz

 φzx

EXERCISES 195

c05.qxd 8/27/2004 2:53 PM Page 195

196

Regression with Social Data: Modeling Continuous and Limited Response Variables,

By Alfred DeMaris

CHAPTER 6

Advanced Issues in Multiple

Regression

CHAPTER OVERVIEW

In this chapter I address a number of topics that are of a more advanced nature. Their

complexity is due primarily to the necessity to resort to matrix algebra for much of

their theoretical development. As matrix algebra may be foreign to many readers, this

is arguably the most diﬃcult chapter in the book. For this reason, the reader is

strongly encouraged to read Section V of Appendix A before proceeding with this

chapter. A familiarity with the notation and major concepts of matrix algebra will be

extremely helpful for getting the most out of this material. On the other hand, those

uncomfortable with the matrix developments can simply skip them and attend only

to the “bottom line,” as expressed in equations such as (6.5), (6.11), and (6.12) and

accompanying discussions.

I begin by reviewing the matrix representation of the multiple regression model.

I then take up the topic of heteroscedasticity and weighted least squares (WLS) esti-

mation, the optimal estimation procedure when the homoscedasticity assumption

fails. Along with this I discuss the use of WLS in testing slope homogeneity across

groups when the assumption of equal error variance fails. I also consider the issue of

using weighted regression on data from complex sampling schemes, employing

WLS with sampling weights. This technique is referred to as weighted ordinary least

squares (WOLS) (Winship and Radbill, 1994). I then return to the issue of omitted-

variable bias, giving a formal development of the problem in the context of multiple

regression. I also give an example showing how omitted-variable bias can aﬀect

interaction terms. The latter part of the chapter is devoted to regression diagnostics.

In particular, I explain how to diagnose regression analyses for undue inﬂuence

exerted by one or more observations. I also give a detailed explication of the problem

c06.qxd 8/27/2004 2:53 PM Page 196

of multicollinearity, its diagnosis, and possible remedies. The chapter ends by con-

sidering two techniques designed to improve on OLS estimates in the presence of

severe collinearity: ridge regression and principal components regression.

MULTIPLE REGRESSION IN MATRIX NOTATION

The Model

In Section V of Appendix A, I outline the matrix representation of the multiple regres-

sion model. Let’s review the basic concepts covered there. Recall that the matrix rep-

resentation of the model for the ith observation is y

 x



ββ

 ε

, where x

 is a 1  p

vector of scores on the p regressors in the model for the ith observation. Here,

p  K  1, and the ﬁrst regressor score is a “1” that serves as the regressor for the

intercept term. Further,

ββ

is a p  1 vector of the parameters in the model, with the

ﬁrst parameter being the intercept, β

. Y

and ε

are the ith response score and the ith

error term, as always. The matrix representation of the model for all n of the y scores

is y  X

ββ



εε

. Here, y is an n  1 vector of response scores, X is an n  p matrix of

the regressor scores for all n observations, and

εε

is an n  1 vector of equation errors

for the n observations. The ith row of X is, of course, x

. As always, it is assumed that

the errors have mean zero and constant variance σ

and are uncorrelated with each

other. These assumptions are encapsulated in the notation ε ⬃ f(0, σ

I). This means

that the errors have some density function, f(

) (typically assumed to be symmetric

about zero, but not necessarily normal except for small samples) with zero mean and

variance–covariance matrix σ

I. (Readers possibly used to the notation x

 for the rep-

resentation of the vector of regressor scores for the ith case may ﬁnd the notation x



used in this book to be somewhat unusual. However, in that the ith case’s regressor

values are contained in the ith row of the n  p matrix of regressor values for all n

observations, and as I use the superscript i to denote row vectors, the use of x

 seems

more appropriate. Note that the ith case’s collection of regressor values written as a

column vector is therefore denoted x

throughout the book.)

OLS Estimates

The vector of OLS estimates of the model parameters is denoted b, and as noted in

Appendix A, its solution is b  (XX)

1

Xy. In Chapter 2 I noted that b

in SLR was

a weighted sum of the y

and therefore normally distributed in large samples, due to

the CLT. Similarly, each of the b

in the multiple regression model is a weighted

sum, or linear combination, of the y

and is therefore also asymptotically normal.

This is readily seen by denoting the p  n matrix (XX)

1

X by the symbol G, and

its kth row (where k  0,1,...,K) as g

. Then the kth regression estimate has the

form g

y. Assuming that the X’s are ﬁxed over repeated sampling (the standard

ﬁxed-X assumption), this is nothing more than a weighted sum of the y’s. The esti-

mates are unbiased for their theoretical counterparts, since, as shown in Appendix A,

E(b) 

ββ

. The variance–covariance matrix for b, denoted V(b), is σ

(XX)

1

. The

variances of the parameter estimates lie on the diagonal of this matrix.

MULTIPLE REGRESSION IN MATRIX NOTATION 197

c06.qxd 8/27/2004 2:53 PM Page 197

To illustrate the form of σ

(XX)

1

, let’s derive the expressions for the variances

of the SLR estimates, as given in Chapter 2, using matrix manipulations. In SLR, the

X matrix can be written [1x], where 1 is a vector of ones and x is the column vec-

tor of scores on the independent variable. Therefore, XX is

XX 

冤冥

[1x] 

冤冥



冤冥

The determinant of XX is

n冱x

 冢冱x冣

 n冱x

 n

苶

 n冢冱 x

 nx

苶

冣 nS

where S

 冱(x  x

苶

)

[see Appendix A, Section II.C(1)]. The inverse of XX is,

therefore,

(XX)

1





冤冥



冤冥

Finally, σ

(XX)

1

(XX)

1



冤冥

As the reader can see, the expressions for V(b

) and V(b

) on the diagonal of

(XX)

1

are the same as given in Chapter 2.

Hat Matrix. The ﬁtted values in regression, denoted yˆ

, are given by yˆ

 x

b. The

vector of ﬁtted values is therefore given by yˆ  Xb. Substituting for b, we have

yˆ  X(XX)

1

Xy, or yˆ  Hy. H, equal to X(XX)

1

X, is called the hat matrix

(Belsley et al., 1980), since it converts y into yˆ. This matrix plays a key role in the

inﬂuence diagnostics discussed later in the chapter. Of particular interest are the

diagonals of this matrix, denoted h

. These tap into the leverage, or potential for

inﬂuence, exerted on the regression estimates by the ith observation. The matrix for-

mula for h

is h

 x

(XX)

1

Regression Model in Standardized Form

Recall from Chapter 2 that the standardized slope in SLR results from the OLS

regression of the standardized version of y on the standardized version of X. The

same holds true in multiple regression. However, to understand the standardized rep-

resentation of the MULR model, we must ﬁrst examine the matrix representation of

the standardized variable scores. Suppose that we denote by y

the n  1 vector of





冱



冱





冱





冱



冱





冱





冱



冱

1x

xx

11

x1

1

x

198 ADVANCED ISSUES IN MULTIPLE REGRESSION

c06.qxd 8/27/2004 2:53 PM Page 198

standardized y-scores and by Z the n  K matrix of standardized X scores (the “1”

disappears from this matrix in the standardization process; the standardized equation

therefore has no intercept). Now, note that (1/n)ZZ  R

, the correlation matrix for

the X’s, and (1/n)Zy

 r

, the vector of correlations between the X’s and y. Why?

First, understand that the ith element in the kth column of Z is of the form







苶



where s

is the standard deviation of the kth regressor. That is, the ikth element of Z

is the kth variable minus its mean divided by its standard deviation for the ith case.

Partitioning Z by its columns, ZZ is

ZZ 

冤

冥

...

] 

冤

冥

This is a K  K matrix whose kth diagonal element is

z





冱



苶

)



and whose oﬀ-diagonal elements are of the form

z





冱



苶

)

x

苶

)



where k and l denote two diﬀerent regressors in the model. Multiplying this matrix

by 1/n results in the kth diagonal element being of the form



冱



苶

)



 1,

and the oﬀ-diagonal elements being of the form



冱

 x

苶

)

(

 x

苶

)/n





That is, the result is the correlation matrix for the X’s in the model. [Technically, we

should be multiplying by 1/(n  1) instead of 1/n, but asymptotically these are equiv-

alent, and 1/n simpliﬁes the expression. It should be evident that the diﬀerence

between n  1 and n is virtually nil in large samples.] A similar argument demon-

strates that (1/n)Zy

 r

z

...

z

MULTIPLE REGRESSION IN MATRIX NOTATION 199

c06.qxd 8/27/2004 2:53 PM Page 199

To write the model in standardized form, we shall ﬁnd it convenient to transform

the response and regressors as follows. Let X*  (1/兹n

苶

)Z and y*  (1/兹n

苶

Myers (1986, p. 76) refers to y* and X* as the vector and matrix, respectively, of

centered and scaled variables. That is, the kth column of X*, for example, represents

a variable of the form



兹冱

苶苶

i

苶

(

苶



苶



苶

)

苶



The vector y* represents Y in similar form. Then the standardized version of the model

is y* X*

ββ



εε

, and the standardized estimates are obtained via the OLS solution:

 (X*X*)

1

X*y* 

冤冢



兹

苶



冣



冢



兹

苶



冣冥

1

冢



兹

苶



冣





兹

苶





冢



ZZ

冣

1



Zy

 R

1

In other words, the standardized regression coeﬃcients are the product of the inverse

of the correlation matrix for the X’s times the correlations of the X’s with y. Further,

letting σ

represent V(ε*), we then denote the variance–covariance matrix of the

errors in the standardized equation by σ

I. Then the variance–covariance matrix of

standardized parameter estimates is V(b

)  σ



)

1

 σ

1

. Having estab-

lished the matrix representations for key elements in the MULR model, we are now

ready to consider additional MULR topics.

HETEROSCEDASTICITY AND WEIGHTED LEAST SQUARES

Until now the standard assumption we have been operating under is that the equation

errors are homoscedastic; that is, they have constant variance at each covariate pat-

tern. In that case, V(

εε

)  σ

I. Suppose that this isn’t the case. That is, suppose that the

error variances vary across covariate patterns such that V(ε

)  σ

for i  1,2,...,n.

Assuming that the errors are still uncorrelated, the form of V(

εε

) is now

冤冥

 V.

Under this scenario the appropriate estimator is the generalized least squares (GLS)

estimator, given by b

 (XV

1

XV

1

y (Myers, 1986). If V  σ

I, b

is sim-

ply b, the OLS estimator. Because V is diagonal, and taking the inverse of a diago-

nal matrix is accomplished by simply inverting the diagonal elements, b

has the

form

 (XD

1

XD

...

0 σ

... .

. ... ... .

. ... ...

...

0 σ

200 ADVANCED ISSUES IN MULTIPLE REGRESSION

c06.qxd 8/27/2004 2:53 PM Page 200

where D

indicates a diagonal matrix with diagonal entries w

 1/σ

. Now, if we

regard b

more closely, we see that it can be expressed as



[(

兹w

苶

)



(

兹w

苶

)]

1

(

兹w

苶

)



兹w

苶

Recall that premultiplying a matrix or vector by a diagonal matrix simply multiplies

each row of that matrix or vector by the diagonal elements. Therefore, the design

matrix now has the form

兹w

苶

 D

兹w

苶

[1x

...

] 

冤冥

 X

and the response vector has the form

兹w

苶

y 

冋册

 y

If we regress y

on X

using OLS, the resulting estimator, (X

)

1

X

, will be

(as the reader can verify by substituting for X

and y

in this expression). What

this means is that b

can be found by transforming the regressors and the response

and then performing OLS on the transformed variables. The transformation involves

multiplying the regressors and the response by the square root of the weight vari-

able, where the weights are the reciprocals of the error variances. Hence this esti-

mator is called the weighted least squares (WLS) estimator. Notice that the ﬁrst

column of the transformed design matrix is no longer a column of ones. Instead, it

is a column of weights, which constitutes another variable. Consequently, the appro-

priate OLS regression analysis is a regression through the origin. The constant term

for the WLS analysis is the coeﬃcient for the weight variable resulting from this run

(McClendon, 1994).

Properties of the WLS Estimator

If the proper weights, 1/σ

, are known, the WLS estimator b

has the following prop-

erties: (1) it is unbiased for

ββ

; (2) it achieves the minimum variance of all linear unbi-

ased estimators; and (3) it is the MLE for

ββ

if the errors are also normally distributed

(Myers, 1986). Unfortunately, the true error variances are typically unavailable, so the

proper weights are rarely known in practice. In this case, the error variances must ﬁrst

be estimated from the data available and then used in the weighting procedure. The

resulting WLS estimator, also known as the feasible generalized least squares (FGLS)

兹w

苶

兹w

苶



兹w

苶

兹w

苶

兹w

苶



兹w

苶

...

兹w

苶

兹w

苶



兹w

苶

兹w

苶

兹w

苶



兹w

苶

HETEROSCEDASTICITY AND WEIGHTED LEAST SQUARES 201

c06.qxd 8/27/2004 2:53 PM Page 201

estimator, is no longer unbiased; however, it is still consistent and has a smaller sam-

pling variance than that of the OLS estimator in large samples (Wooldridge, 2000).

Consequences of Heteroscedasticity

What are the consequences of using OLS in the presence of heteroscedasticity? First,

the OLS estimators are ineﬃcient. That is, there exist estimators with a smaller sam-

pling variance: namely, the WLS (i.e., FGLS) estimator. This means that tests of

signiﬁcance will typically be more sensitive when using WLS than when using OLS.

Perhaps more important, however, the estimated standard errors of the OLS coeﬃcients

obtained via the formula σ

(XX)

1

(the formula employed by all regression software)

are no longer valid under heteroscedasticity. To see why, recall from Appendix A that

for the OLS b,

V(b)  V[(XX)

1

Xy]  (XX)

1

XV(y)X(XX)

1

As long as V(y)  σ

I, this matrix reduces to σ

(XX)

1

and then σ

(XX)

1

becomes

an unbiased estimator of V(b). However, when heteroscedasticity prevails, V(y)  V

(shown above). The variance of b is then

V(b)  V[(XX)

1

Xy]  (XX)

1

XVX(XX)

1

Hence, σ

(XX)

1

is no longer a valid estimator of this variance–covariance matrix.

White’s Estimator of V(b). An alternative estimator of V(b) which is robust to het-

eroscedasticity is the White estimator (White, 1980). This is based on the idea that

the OLS b is a consistent estimator of

ββ

, which implies that the OLS residuals are

“pointwise consistent estimators” (Greene, 2003, p. 198) of the population ε

Moreover, the squared residuals would be consistent estimators of the squared ε

whose average values represent the variances of the ε

, since V(ε

)  E(ε

 E(ε

))



E(ε

). Assuming at least one continuous predictor, the squared error for each case

would typically be unique, so it represents its own average. Hence, the squared OLS

residuals can be used to estimate the error variances in V, leading to the White het-

eroscedasticity-robust estimator,

(b)  (XX)

1

XV

X(XX)

1

where V

is a diagonal matrix containing the squared OLS residuals (White, 1980).

Testing for Heteroscedasticity

In this section I discuss two tests for heteroscedasticity that are relatively easy to

implement in any regression software that makes the residuals available for further

manipulation. These tests are also available on request in some software (e.g., STATA,

LIMDEP). The null hypothesis for either test is that the errors are homoscedastic. The

202 ADVANCED ISSUES IN MULTIPLE REGRESSION

c06.qxd 8/27/2004 2:53 PM Page 202

ﬁrst is White’s test (White, 1980). For the substantive model E(Y)  β

 β



...

 β

, the test is accomplished by ﬁrst estimating the model with OLS

and then saving the residuals. One then regresses the squares of the residuals on all pre-

dictors in the model plus all nonredundant crossproducts among the predictors. For

example, if the model is E(Y)  β

 β

, one regresses the squared

residuals from estimating this model on X

, X

, and X

The test is then nR

from this run, which, under the null hypothesis of homoscedastic-

ity, is distributed as chi-squared with degrees of freedom equal to the number of regres-

sors used to model the squared residuals (in this case, that would be 9). Be advised,

however, that more than just homoscedasticity is being tested with this statistic. In fact,

White (1980, p. 823) describes this test as follows: “. . . the null hypothesis maintains

not only that the errors are homoskedastic, but also that they are independent of the

regressors, and that the model is correctly speciﬁed... .”In other words, this is more

of a general test for model misspeciﬁcation.

The second test is the Breusch–Pagan test (Breusch and Pagan, 1979). This test

is much more focused on heteroscedasticity than White’s test, and, in fact, assumes

that the error variance is related in some systematic fashion to the model predictors

(Greene, 2003). There are various forms of the test, but the one I advocate here is

that suggested by Wooldridge (2000). In this case, one regresses the squared OLS

residuals from one’s substantitve model on just the predictors in one’s substantive

model. Once again, the test is nR

from this regression, which is distributed as chi-

squared with K degrees of freedom under the null hypothesis of homoscedasticity.

Example: Regression of Coital Frequency

Let’s consider an example of a heteroscedastic model. Figure 6.1 presents a scatter-

plot of coital frequency in the last month against the male partner’s age for the 416

couples in the couples dataset. (This is similar to Figure 2.7, which shows a plot of

the residuals from this regression against male partner’s age.) Nonconstant error

variance is suggested by the wedge-shaped trend in the points in which the spread of

points tapers down dramatically from left to right. This phenomenon makes sub-

stantive sense. One would expect that there would be considerable variability in

coital frequency among young couples, since some are by choice more sexually

active than others. However, age brings its own limitations to sexual activity, regard-

less of individual proclivities. We would therefore expect less variability in sexual

frequency as couples enter middle and old age.

Table 6.1 presents regression results for these data. The ﬁrst column shows the OLS

results for the regression of coital frequency on male’s age. The eﬀect of male’s age is

negative and signiﬁcant and suggests that each additional year of age reduces the cou-

ple’s coital frequency by about .15 time per month. The eﬀect is quite signiﬁcant. The

second column shows the standard errors estimated by White’s heterscedasticity-robust

technique. Compared to the OLS standard errors, the White standard errors are larger

for the intercept but slightly smaller for the eﬀect of age. No real substantive diﬀerence

would result from using these standard errors in place of the OLS ones, however. The

third column presents the model for White’s test, which includes both male age and its

HETEROSCEDASTICITY AND WEIGHTED LEAST SQUARES 203

c06.qxd 8/27/2004 2:53 PM Page 203

square. The dependent variable for this run is the squared residual from the model in

the ﬁrst column. White’s test is therefore 416(.0423)  17.597. With 2 degrees of free-

dom, this is quite signiﬁcant (p  .00015). For the Breusch–Pagan test, we regress the

squared residuals from the substantive model on male’s age (results not shown). The

from this run is .0328. Hence, the Breusch–Pagan test is 416(.0328)  13.645,

204 ADVANCED ISSUES IN MULTIPLE REGRESSION

Figure 6.1 Scatterplot of coital frequency with male’s age for 416 couples in the NSFH.

Table 6.1 OLS and WLS Results for the Regression of Coital

Frequency on Male Age for 416 Couples in the NSFH

OLS: White Model for WLS:

Predictor b(σ

) σ

White’s Test

b(σ

)

Intercept 14.028*** 201.057*** 13.522***

(.874) (.998) (.992)

Male age .149*** 6.234** .143***

(.019) (.018) (.016)

(Male age)

.049*

OLS

.1295 .0423 .1570

WLS

.1280

Response variable is the squared OLS residual.

* p  .05. ** p  .01. *** p  .001.

c06.qxd 8/27/2004 2:53 PM Page 204