Alfred DeMaris - Regression with Social Data, Modeling Continuous and Limited Response Variables

Подождите немного. Документ загружается.

parameters estimated in the models for males and females.) What do you

conclude?

3.27 Suppose that we want to test whether β

⫽ β

in model A: E(Y) ⫽ β

⫹ β

⫹

⫹ β

. We have seen that this can be done via a nested F test of this

model against model B: E(Y) ⫽ β

⫹ γ

⫹ X

) ⫹ γ

, as well as via a t test

of the form t ⫽ (b

⫺ b

)/σ

⫺b

, using estimates from model A. Woodridge

(2000) suggests yet a third way of performing this test. Let θ⫽ β

⫺ β

implying that β

⫽ θ ⫹ β

. Then model A can be expressed as E(Y) ⫽ β

⫹

(θ ⫹ β

⫹ β

, or E(Y) ⫽ β

⫹ θX

⫹ β

,or

E(Y) ⫽ β

⫹ θX

⫹ β

⫹ X

) ⫹ β

. A test of θ ⫽ 0 for this model is then a

test of H

: β

⫽ β

in model A. Notice that θ is just the coeﬃcient for X

in a

model that uses X

, X

, and X

⫹ X

as the three regressors. Use the couples

dataset to verify that these three ways of testing β

⫽ β

are the same, with

Y ⫽ WIFHAP, X

⫽ MFIGHTS, X

⫽ FFIGHTS, and X

⫽ DURYRS.

EXERCISES 125

c03.qxd 8/27/2004 2:48 PM Page 125

126

Regression with Social Data: Modeling Continuous and Limited Response Variables,

By Alfred DeMaris

CHAPTER 4

Multiple Regression with Categorical

Predictors: ANOVA and ANCOVA

Models

CHAPTER OVERVIEW

Often, the explanatory variables in a regression model are categorical, that is, vari-

ables with only a few discrete values. These values may not even be ordered, but may

instead represent categories of a purely qualitative variable such as religious

aﬃliation or ethnic identiﬁcation. In this chapter I discuss how to incorporate such

variables into a regression model. When all of the predictors are categorical, MULR

is equivalent to the analysis of variance (ANOVA). When both categorical and con-

tinuous variables are present in the model, the procedure is equivalent to the analy-

sis of covariance (ANCOVA). Although MULR is equivalent to these procedures,

researchers typically reserve the terms ANOVA and ANCOVA for analyses in which

the emphasis is on group comparisons of means on a dependent variable. More gen-

erally, in the regression context, a categorical predictor may simply be one of a set

of important explanatory variables in which the researcher is interested. I begin by

outlining two systems of coding for categorical variables: dummy coding and eﬀect

coding. I then discuss one-way and two-way ANOVA via regression and illustrate

interaction between categorical predictors. Regression with both categorical and

continuous predictors is then taken up, along with the issue of multiple comparisons

of group means. Finally, I discuss models in which continuous and categorical pre-

dictors are allowed to interact with one another, and end the chapter by showing the

equivalence between the Chow test and a model in which a categorical predictor

is allowed to interact with all other covariates in the model. The example used

throughout is drawn from the faculty salary dataset.

c04.qxd 8/27/2004 2:49 PM Page 126

MODELS WITH EXCLUSIVELY CATEGORICAL PREDICTORS

Dummy Coding

Categorical predictors cannot simply be entered as is into a regression equation.

One obvious reason is that the values may not convey any real quantitative informa-

tion, as in the case of a nominal variable. Even with a quantitative variable, however,

its relationship with Y may not be linear. What is needed is a system of coding

that is invariant to both the qualitative nature of a covariate’s values and to the func-

tional form of its relationship with Y. One such system is called dummy coding. The

name comes from the fact that the codes—ones and zeros—only represent whether

or not a case is in a given category of the variable, and otherwise convey no quanti-

tative meaning. As an example, regard Table 4.1, which presents average academic-

year salaries for 725 faculty members at Bowling Green State University (BGSU)

according to college and to whether they are on graduate faculty. Suppose that we

wish to regress academic year salary on whether or not someone is on graduate

faculty (a status that depends on research productivity and when conferred, allows

one to teach graduate classes). We create a variable, GRAD, coded 1 if the person

is on graduate faculty and 0 otherwise. This is called a dummy variable. Letting

Y  academic year salary, the model is E(Y)  β

 δ GRAD (I like to use deltas to

denote the coeﬃcients of dummy variables). How is this interpreted? Well, for those

who are not on graduate faculty, the mean salary is E(Y)  β

 δ(0)  β

. Thus, the

intercept is the mean of Y for those in the group coded 0, which is called the

contrast, reference, or omitted group. The mean salary for those on graduate faculty

is E(Y )  β

 δ(1)  β

 δ. I refer to this group as the interest category. The

diﬀerence in means between these two groups is E(Y 冟 on graduate faculty)  E(Y 冟not

on graduate faculty)  β

 δ  β

 δ. A test of whether or not this mean diﬀerence

is signiﬁcant is a test of H

: δ  0. This is just the usual test for the signiﬁcance of

a regression coeﬃcient, consisting of the parameter estimate, d, divided by its

estimated standard error. Least squares estimates of the parameters are obtained in

the usual fashion—by minimizing SSE with respect to the parameters. The least

squares estimate of β

is the sample mean for the omitted group, while the least

squares estimate of δ is the diﬀerence in sample means for the interest and omitted

groups. The estimated regression equation in this case is yˆ  39582  11393 GRAD.

From Table 4.1 it is evident that the intercept here is just the mean salary for those

not on graduate faculty, and the slope is the diﬀerence in mean salaries for the two

groups: 50975.061  39581.895  11393.166. The test statistic for the slope

(not shown) is a t value of 10.552, which is highly signiﬁcant (p  .0001). Recall

that the regression model assumes equal error variance, implying equal Y variance,

at each covariate pattern. There are only two covariate patterns here, 1 and 0.

The assumption, therefore, is equal Y variance in each group—those on graduate

faculty and those not on graduate faculty—in the population. In other words, in

this case, regression accomplishes a test for the diﬀerence between group means

under the assumption of equal Y variance and is therefore equivalent to the two-

sample t test.

MODELS WITH EXCLUSIVELY CATEGORICAL PREDICTORS 127

c04.qxd 8/27/2004 2:49 PM Page 127

Multicategory Variables. Suppose now that the categorical variable has more than

two categories. For example, the variable college in Table 4.1 has ﬁve categories:

“arts and sciences,” “business,” “education,” “other,” and “ﬁrelands” (actually, a

branch campus of BGSU being treated as a college here). In general, for an M-cate-

gory variable, we need to create M  1 dummy variables to represent it in a regres-

sion model. Hence, we need to create four dummy variables to represent college. I

will let “arts and sciences” be the contrast group and will call the dummies FIREL,

BUSINESS, EDUCATN, and OTHER. Table 4.2 shows the coding of these dummy

variables for faculty members from the ﬁve colleges. As is evident, each dummy

variable takes on the value of 1 if a faculty member is in a particular college, and 0

otherwise. If someone is in “arts and sciences,” the contrast category, all dummies

equal 0. Notice the naming convention for the dummies that I follow here: Each

dummy is named after the interest category for that dummy. Hence FIREL takes on

the value 1 if someone is in the “ﬁrelands” college, and 0 otherwise, and so on. This

makes it very easy to identify what the interest category is for a particular dummy.

Why don’t we need another dummy for the category “arts and sciences”? Recall the

assumption for MULR that no predictor is an exact linear combination of the other

predictors. Suppose that we add one more dummy called ARTSCI, coded 1 if some-

one is in “arts and sciences,” and 0 otherwise. Then it is easy to verify that the fol-

lowing linear equation perfectly identiﬁes ARTSCI for each case:

ARTSCI  1  FIREL  BUSINESS  EDUCATN  OTHER.

For example, a faculty member in “arts and sciences” has ARTSCI as 1  0  0 

0  0  1. Someone in the “ﬁrelands” college has ARTSCI as 1  1  0  0  0  0.

128 MULTIPLE REGRESSION WITH CATEGORICAL PREDICTORS

Table 4.1 Mean Academic Year Salaries for 725 Faculty Members by College and

Whether on Graduate Faculty

On Graduate Faculty?

Yes No Overall

College (n)(n)(n)

Arts and Sciences 51471.122 40592.507 49188.688

(290) (77) (367)

Firelands 55411.000 40106.206 40543.486

(1) (34) (35)

Business 60250.923 36498.769 54196.452

(76) (26) (102)

Education 47028.545 38294.214 44311.198

(62) (28) (90)

Other 44500.859 40137.915 43268.577

(94) (37) (131)

Overall 50975.061 39581.895 46302

(n) (523) (202) (725)

c04.qxd 8/27/2004 2:49 PM Page 128

(The reader can verify that ARTSCI is similarly determined by the values of the

other four dummies for the remaining classiﬁcations of college.) In this case, since

ARTSCI is a perfect linear combination of the other dummies, the no-exact-

collinearity assumption is violated, and the regression parameters are no longer

identiﬁed. Intuitively, it is also evident that the pattern of ones and zeros for the four

dummies conveys all of the information required regarding group membership in

each of the ﬁve categories. The pattern in which all dummies equal zero identiﬁes

membership in the omitted group.

The model now becomes

E(Y)  β

 δ

FIREL  δ

BUSINESS  δ

EDUCATN  δ

OTHER. (4.1)

This is equivalent to a one-way analysis of variance (one-way ANOVA). It is called

“one-way” since there is only one factor, or one independent variable, in the model.

Mean salary for “arts and sciences” faculty is

E(Y )  β

 δ

(0)  δ

(0)  β

Thus, once again, the intercept is the mean of Y for the omitted group. Each regres-

sion coeﬃcient (i.e., each δ) is the diﬀerence in means for the dummied category

(the interest category) and the reference category. For example, δ

is the diﬀerence

in mean salary between “ﬁrelands” faculty and “arts and sciences” (A&S) faculty, as

can be seen by

E(Y 冟ﬁrelands)  E(Y 冟 A&S)  β

 δ

(1)  δ

(0)  δ

(0)

 [β

 δ

(0)  δ

(0)]  δ

(Again, the reader can verify using the model that the other deltas represent mean

contrasts for each college with “arts and sciences.”) With a multicategory predictor,

however, the deltas, individually, do not capture all of the potential mean contrasts

between pairs of categories. For an M-category predictor there are a total of

M(M  1)/2 nonredundant contrasts that can be evaluated. The deltas capture, in

the current model, the contrasts between each of the dummied colleges and “arts and

MODELS WITH EXCLUSIVELY CATEGORICAL PREDICTORS 129

Table 4.2 Dummy Variable Coding to Represent the Variable College for

Faculty Members in Each of the Colleges at BG

Faculty

Dummy Variable

Member Is In: FIREL BUSINESS EDUCATN OTHER

Arts and Sciences 0 0 0 0

Firelands 1 0 0 0

Business 0 1 0 0

Education 0 0 1 0

Other 0 0 0 1

c04.qxd 8/27/2004 2:49 PM Page 129

sciences.” What about the contrast between, say, “ﬁrelands” and “business”? It turns

out that the diﬀerences between the deltas capture the other contrasts. In the case of

“ﬁrelands” vs. “business,” we have that

E(Y 冟ﬁrelands)  E(Y 冟 business)  β

 δ

 (β

 δ

)  δ

 δ

Similarly,

E(Y 冟ﬁrelands)  E(Y 冟 education)  β

 δ

 (β

 δ

)  δ

 δ

The reader can again verify that δ

 δ

, δ

 δ

, δ

 δ

, and δ

 δ

capture the rest

of the contrasts, for a total of 5(4)/2  10 possible contrasts. Least squares estimates

for the model in equation (4.1) are shown as model 1 in Table 4.3.

Once again, the least squares estimate of β

is the sample mean of Y for the group

omitted. Thus, the intercept in model 1 is the sample mean salary for faculty in “arts

and sciences,” or 49189 (it’s 49188.688 in Table 4.1). The OLS estimates of the deltas

are, again, diﬀerences in mean salaries, compared to “arts and sciences,” for each col-

lege. For example, the coeﬃcient for ﬁrelands, 8645.202, is the mean salary for

“ﬁrelands” faculty (40543.486 in Table 4.1) minus the mean for “arts and sciences”

faculty (49188.688). The diﬀerence is 8645.202. The global F test for the model is

signiﬁcant, which means that at least one of the deltas is nonzero. We see, in fact, that

t statistics for the coeﬃcients suggest that all of the deltas are nonzero. Recall that the

F test actually tests the null hypothesis that every linear combination of the parame-

ters is zero. In dummy variable regression, it is especially important to keep this more

general character of the F test in mind, since linear combinations of the parameters

130 MULTIPLE REGRESSION WITH CATEGORICAL PREDICTORS

Table 4.3 Models for Academic Year Salary Regressed on College and Whether

on Graduate Faculty

Predictor Model 1

Model 2

Model 3

Model 4

Intercept 49189.000*** 46302.000*** 40348.000*** 40593.000***

Firelands 8645.202*** 5758.194*** 123.946 486.301

Business 5007.765*** 7894.772*** 5512.277*** 4093.737

Education 4877.490** 1990.482 3744.090* 2298.292

Other 5920.111*** 3033.103** 5107.463*** 454.592

Grad faculty 11188.000*** 10879.000***

Grad faculty  Firelands 4426.179

Grad faculty  Business 12874.000***

Grad faculty  Education 2144.285

Grad faculty  Other 6515.672*

F 14.415*** 14.415*** 32.930*** 22.096***

∆F 99.142*** 7.151***

.074 .074 .186 .218

Uses dummy coding.

Uses eﬀect coding.

* p  .05. ** p  .01. *** p  .001.

c04.qxd 8/27/2004 2:49 PM Page 130

MODELS WITH EXCLUSIVELY CATEGORICAL PREDICTORS 131

now make sense. In fact, every diﬀerence of the form δ

 δ

, comparing mean

diﬀerences between interest categories, is a linear combination of the parameters

that we are interested in. Hence, it may happen that the global F test is signiﬁcant

but none of the individual dummy coeﬃcients is. This is entirely reasonable, since it

may be one of the other contrasts—between interest categories—that is nonzero. The

test statistic for these other contrasts is of the form

t 







where the numerator of the test is the diﬀerence between sample estimates of the

dummy coeﬃcients, and the denominator of the test is the estimated standard error of

that diﬀerence. We could compute these contrasts by hand, employing the variance–

covariance matrix of parameter estimates to obtain the standard errors of the diﬀer-

ences. But the equivalent and much simpler procedure is simply to change the contrast

category and rerun the regression. In the present case, I reran the regression with, alter-

nately, “ﬁrelands,” “business,” and “education” as the contrast categories to obtain the

other six contrasts. The other signiﬁcant contrasts, using an α of .05, were “business”

versus “ﬁrelands,” “education” versus “business,” and “other departments” versus

“business.”

Eﬀect Coding

Another type of coding that can be quite useful for categorical variables is eﬀect cod-

ing. Rather than contrasting a given group’s mean with that of a single other group,

we might want to contrast it with a kind of overall average, across groups, on the

dependent variable. Eﬀect coding allows comparisons of each interest category’s

mean of Y to a “grand mean” of Y across groups. In eﬀect coding, we once again

require M  1 variables for an M-category predictor. The interest categories of

eﬀect-coded indicators are, once again, coded 1 and 0 for being, vs. not being, in the

category of interest. However, this time, instead of taking the value of zero on each

indicator, the contrast category is coded “1” on each. Table 4.4 shows eﬀect cod-

ing for the colleges at BGSU, with “arts and sciences” once again serving as the

omitted group.

Table 4.4 Eﬀect Coding to Represent the Variable College for Faculty

Members in Each of the Colleges at BG

Faculty

Eﬀect Variable

Member Is In: FIREL BUSINESS EDUCATN OTHER

Arts and Sciences 1 1 1 1

Firelands 1 0 0 0

Business 0 1 0 0

Education 0 0 1 0

Other 0 0 0 1

c04.qxd 8/27/2004 2:49 PM Page 131

With this type of coding, the intercept is now the unweighted average of all ﬁve

group means. Why? Let’s let the letters A, F, B, E, and O represent the colleges “arts

and sciences,” “ﬁrelands,” “business,” “education,” and “other departments,” respec-

tively. Then the model using eﬀect coding of colleges is

E(Y)  µ  β

 δ

F  δ

B δ

E  δ

The population mean salary for each college is then

 β

 δ

(1)  δ

(0)  δ

(0)  β

 δ

 β

 δ

(0)  δ

(1)  δ

(0)  δ

(0)  β

 δ

 β

 δ

(0)  δ

(1)  δ

(0)  β

 δ

 β

 δ

(0)  δ

(1)  β

 δ

 β

 δ

(1)  δ

(1)  β

 (δ

 δ

Now consider the unweighted average of the group means:

苶





冱







 δ



 δ



 β





 β

 δ



 β



(δ

 δ



 δ

)







 β

Hence β

is clearly the unweighted mean of the group means, or the grand mean,

and each delta is the diﬀerence between the mean of a given college’s salary and the

grand mean of all colleges’ salaries. For example, the diﬀerence between Firelands’

average salary and the grand mean is β

 δ

 β

 δ

, and so on.

Model 2 in Table 4.3 shows academic year salary regressed on college,now

coded using eﬀect coding. The grand mean of all colleges’ salaries is 46302.

According to the estimate of δ

,“ﬁrelands” average salaries are 5758.194 below the

overall average salary, a diﬀerence that is quite signiﬁcant. On the other hand, the

Business College’s average salary is 7894.772 above the grand mean, and the “other

departments” category of departments has an average salary that is 3033.103 lower

than the grand mean. These are also signiﬁcant diﬀerences. Although the mean

salary for the College of Education is 1990.482 below the grand mean in the sam-

ple, this is not a signiﬁcant diﬀerence. Hence, there is not enough evidence to sug-

gest that average salaries for the College of Education are any diﬀerent than average

salaries across all colleges at BGSU. Another way to phrase this is that there is not

enough evidence to conclude that Education departments’ salaries are any diﬀerent

than the university’s average salary for professors. If the diﬀerence in average salary

between the College of Arts and Sciences and the grand mean is of interest, the coding

must be changed to make a diﬀerent college the reference category. As with dummy

coding, the mean diﬀerence in Y between diﬀerent interest categories is captured

132 MULTIPLE REGRESSION WITH CATEGORICAL PREDICTORS

c04.qxd 8/27/2004 2:49 PM Page 132

by diﬀerences between the deltas. For example, the diﬀerence in mean salaries

between “business” and “education” is E(Y 冟business)  E(Y 冟 education)  β

 δ



(β

 δ

)  δ

 δ

. According to model 2, the estimated diﬀerence is 7894.772 

(1990.482)  9885.254. From Table 4.1 we can verify this ﬁgure by calculating the

diﬀerence between the two sample mean salaries: 54196.452  44311.198 

9885.254. The reader is invited to consult Hardy (1993) or McClendon (1994) for

other ways of coding categorical variables. For most of the book, I employ dummy

coding exclusively. Dummy coding is by far the most common form of coding for

categorical variables in regression models.

Two-Way ANOVA in Regression

Model 3 in Table 4.3 adds the variable GRAD, representing membership on the

graduate faculty, to the model for academic year salary. The theoretical model is

now

E(Y)  β

 δ

FIREL  δ

BUSINESS  δ

EDUCATN

 δ

OTHER  δ

GRAD. (4.2)

This is equivalent to a two-way ANOVA model, since there are now two categorical

factors in the model: college and graduate faculty status. The model posits that

salary is a purely additive function of college and graduate faculty status. There-

fore, controlling for college, being on the graduate faculty is estimated to result in

an increase of 11188 in average academic year salary, a very signiﬁcant incre-

ment. Also, controlling for being on graduate faculty, being in, say, the College of

Education is worth a reduction in mean salary of 3744.09 compared to being in the

College of Arts and Sciences—a signiﬁcant decrement. The model also allows us to

predict mean salary based on college and graduate faculty status. Thus, for those in,

say, the College of Education, who are on graduate faculty, the estimated mean

salary is

yˆ  b

 d

(0)  d

(1)  d

(0)  d

(1)  b

 d

 40348  3744.09  11188  47791.91.

However, this time the predictions do not equal the sample means shown in Table 4.1.

The average salary for those in the College of Education who are members of the

graduate faculty is actually 47028.545, according to Table 4.1. Why the discrep-

ancy? The additive model assumes that there is no interaction between the categori-

cal predictors in their eﬀects on Y. This means, for example, that being on graduate

faculty is worth the same salary increment, regardless of college. Or, it means that

the diﬀerence between any two colleges in average salary is the same, regardless of

whether someone is on graduate faculty or not. In the current example, as we shall

see, this is not particularly realistic.

MODELS WITH EXCLUSIVELY CATEGORICAL PREDICTORS 133

c04.qxd 8/27/2004 2:49 PM Page 133

Interaction between Categorical Predictors

The model that allows interaction between college and graduate faculty status is

E(Y )  β

 δ

FIREL  δ

BUSINESS  δ

EDUCATN

 δ

OTHER  δ

GRAD  γ

GRAD * FIREL  γ

GRAD * BUSINESS

 γ

GRAD * EDUCATN  γ

GRAD * OTHER. (4.3)

There are two ways to interpret this model, depending on which is the focus variable

(the variable whose eﬀect varies over levels of the other variable) and which is the

moderator variable (the variable whose levels condition the eﬀect of the focus vari-

able). If GRAD (graduate faculty status) is the focus and college is the moderator,

we factor equation (4.3) so that the common multipliers of GRAD are all collected

in one partial eﬀect. The result is

E(Y )  β

 δ

FIREL  δ

BUSINESS  δ

EDUCATN  δ

OTHER

 (δ

 γ

FIREL  γ

BUSINESS  γ

EDUCATN  γ

OTHER) GRAD.

Here it is clear that the eﬀect of GRAD, controlling for college,is

 γ

FIREL  γ

BUSINESS  γ

EDUCATN  γ

OTHER

This implies that the eﬀect of being on graduate faculty depends on membership in

a particular college. For example, the eﬀect of being on graduate faculty for those in

“arts and sciences” is δ

 γ

(0)  γ

(0)  δ

. This gives meaning to

the main eﬀect of GRAD in equation (4.3)—it’s the expected diﬀerence in salary for

those on, versus not on, graduate faculty among all those in the College of Arts and

Sciences. For faculty in, say, the Business College, the eﬀect of GRAD is δ



(0)  γ

(1)  γ

(0)  γ

(0)  δ

 γ

. Hence, δ

 γ

is the expected diﬀerence in

salary for those on, versus not on, graduate faculty among all those in the Business

College. Should γ

prove to be equal to zero, the eﬀect of GRAD would not be

diﬀerent in “arts and sciences” than in “business.” The other gammas are interpreted

in a similar fashion: Each is the diﬀerence in the impact of being on graduate faculty

(as opposed to not being on graduate faculty) for the given college compared to “arts

and sciences.”

On the other hand, let’s say that college is the focus variable and GRAD is the

moderator. Then factoring the common multipliers of each college dummy in equa-

tion (4.3), we have

E(Y )  β

 δ

GRAD  (δ

 γ

GRAD) FIREL  (δ

 γ

GRAD) BUSINESS

 (δ

 γ

GRAD) EDUCATN  (δ

 γ

GRAD) OTHER.

In this factoring of the equation, it becomes clear that the impact of being in a par-

ticular college, compared to being in “arts and sciences,” is dependent on whether

134 MULTIPLE REGRESSION WITH CATEGORICAL PREDICTORS

c04.qxd 8/27/2004 2:49 PM Page 134