240 Part 1 Regression Analysis with Cross-Sectional Data
just tell you there is perfect collinearity. It is best to carefully specify the dummy variables
because then we are forced to properly interpret the final model.
Even though single men is the base group in (7.11), we can use this equation to obtain
the estimated difference between any two groups. Because the overall intercept is common
to all groups, we can ignore that in finding differences. Thus, the estimated proportionate dif-
ference between single and married women is .110 (.198) .088, which means that
single women earn about 8.8% more than married women. Unfortunately, we cannot use
equation (7.11) for testing whether the estimated difference between single and married
women is statistically significant. Knowing the standard errors on marrfem and singfem is not
enough to carry out the test (see Section 4.4). The easiest thing to do is to choose one of
these groups to be the base group and to reestimate the equation. Nothing substantive
changes, but we get the needed estimate and its standard error directly. When we use mar-
ried women as the base group, we obtain
log(wage) (.123) (.411)marrmale (.198)singmale (.088)singfem …,
log(
ˆ
wage) (.106) (.056)marrmale (.058)singmale (.052)singfem …,
where, of course, none of the unreported coefficients or standard errors have changed. The
estimate on singfem is, as expected, .088. Now, we have a standard error to go along with
this estimate. The t statistic for the null that there is no difference in the population between
married and single women is t
singfem
.088/.052 1.69. This is marginal evidence against
the null hypothesis. We also see that the estimated difference between married men and mar-
ried women is very statistically significant (t
marrmale
7.34).
The previous example illustrates a general principle for including dummy variables to
indicate different groups: if the regression model is to have different intercepts for, say, g
groups or categories, we need to include g1 dummy variables in the model along with
an intercept. The intercept for the base group is the overall intercept in the model, and the
dummy variable coefficient for a particular
group represents the estimated difference in
intercepts between that group and the base
group. Including g dummy variables along
with an intercept will result in the dummy
variable trap. An alternative is to include g
dummy variables and to exclude an overall
intercept. This is not advisable because
testing for differences relative to a base
group becomes difficult, and some regression packages alter the way the R-squared is com-
puted when the regression does not contain an intercept.
Incorporating Ordinal Information
by Using Dummy Variables
Suppose that we would like to estimate the effect of city credit ratings on the municipal
bond interest rate (MBR). Several financial companies, such as Moody’s Investors Service
and Standard and Poor’s, rate the quality of debt for local governments, where the ratings
In the baseball salary data found in MLB1.RAW, players are given
one of six positions: frstbase, scndbase, thrdbase, shrtstop,
outfield, or catcher. To allow for salary differentials across position,
with outfielders as the base group, which dummy variables would
you include as independent variables?
QUESTION 7.2