others will just tell you there is perfect collinearity. It is best to carefully specify the dummy
variables, because it forces us to properly interpret the final model.
Even though single men is the base group in (7.11), we can use this equation to obtain
the estimated difference between any two groups. Since the overall intercept is common to
all groups, we can ignore that in finding differences. Thus, the estimated proportionate dif-
ference between single and married women is .110 (.198) .088, which means that
single women earn about 8.8% more than married women. Unfortunately, we cannot use
equation (7.11) for testing whether the estimated difference between single and married
women is statistically significant. Knowing the standard errors on marrfem and singfem is
not enough to carry out the test (see Section 4.4). The easiest thing to do is to choose one
of these groups to be the base group and to reestimate the equation. Nothing substantive
changes, but we get the needed estimate and its standard error directly. When we use mar-
ried women as the base group, we obtain
log(
ˆ
wage) (.123) (.411)marrmale (.198)singmale (.088)singfem …,
log(
ˆ
wage) (.106) (.056)marrmale (.058)singmale (.052)singfem …,
where, of course, none of the unreported coefficients or standard errors have changed. The
estimate on singfem is, as expected, .088. Now, we have a standard error to go along with
this estimate. The t statistic for the null that there is no difference in the population
between married and single women is t
singfem
.088/.052 艐 1.69. This is marginal evi-
dence against the null hypothesis. We also see that the estimated difference between mar-
ried men and married women is very statistically significant (t
marrmale
7.34).
The previous example illustrates a general principle for including dummy variables
to indicate different groups: if the regression model is to have different intercepts for,
say g groups or categories, we need to include g 1 dummy variables in the model
along with an intercept. The intercept for the base group is the overall intercept in the
model, and the dummy variable coefficient
for a particular group represents the esti-
mated difference in intercepts between that
group and the base group. Including g
dummy variables along with an intercept
will result in the dummy variable trap. An
alternative is to include g dummy variables
and to exclude an overall intercept. This is
not advisable because testing for differences relative to a base group becomes difficult,
and some regression packages alter the way the R-squared is computed when the regres-
sion does not contain an intercept.
Incorporating Ordinal Information by Using Dummy
Variables
Suppose that we would like to estimate the effect of city credit ratings on the munici-
pal bond interest rate (MBR). Several financial companies, such as Moody’s Investment
Service and Standard and Poor’s, rate the quality of debt for local governments, where
Chapter 7 Multiple Regression Analysis With Qualitative Information: Binary (or Dummy) Variables
221
QUESTION 7.2
In the baseball salary data found in MLB1.RAW, players are given
one of six positions: frstbase, scndbase, thrdbase, shrtstop, outfield,
or catcher. To allow for salary differentials across position, with out-
fielders as the base group, which dummy variables would you
include as independent variables?
d 7/14/99 5:55 PM Page 221