A truncated regression model arises when we exclude, on the basis of y,a subset of the
population in our sampling scheme. In other words, we do not have a random sample from
the underlying population, but we know the rule that was used to include units in the sam-
ple. This rule is determined by whether y is above or below a certain threshold. We explain
more fully the difference between censored and truncated regression models later.
Censored Regression Models
While censored regression models can be defined without distributional assumptions, in
this subsection we study the censored normal regression model. The variable we would
like to explain, y,follows the classical linear model. For emphasis, we put an i subscript
on a random draw from the population:
y
i
0
x
i
u
i
, u
i
x
i
,c
i
~ Normal(0,
2
) (17.36)
w
i
min(y
i
,c
i
). (17.37)
Rather than observing y
i
, we only observe it if it is less than a censoring value, c
i
. Notice
that (17.36) includes the assumption that u
i
is independent of c
i
. (For concreteness, we
explicitly consider censoring from above,
or right censoring; the problem of cen-
soring from below, or left censoring,is
handled similarly.)
One example of right data censoring is
top coding. When a variable is top coded,
we know its value only up to a certain
threshold. For responses greater than the
threshold, we only know that the variable
is at least as large as the threshold. For
example, in some surveys, family wealth
is top coded. Suppose that respondents are
asked their wealth, but people are allowed
to respond with “more than $500,000.” Then, we observe actual wealth for those respon-
dents whose wealth is less than $500,000 but not for those whose wealth is greater than
$500,000. In this case, the censoring threshold, c
i
, is the same for all i. In many situa-
tions, the censoring threshold changes with individual or family characteristics.
If we observed a random sample for (x,y), we would simply estimate
by OLS, and
statistical inference would be standard. (We again absorb the intercept into x for simplic-
ity.) The censoring causes problems. Using arguments similar to the Tobit model, an OLS
regression using only the uncensored observations—that is, those with y
i
c
i
—produces
inconsistent estimators of the
j
. An OLS regression of w
i
on x
i
, using all observations,
does not consistently estimate the
j
, unless there is no censoring. This is similar to the
Tobit case, but the problem is much different. In the Tobit model, we are modeling eco-
nomic behavior, which often yields zero outcomes; the Tobit model is supposed to reflect
this. With censored regression, we have a data collection problem because, for some rea-
son, the data are censored.
610 Part 3 Advanced Topics
Let mvp
i
be the marginal value product for worker i; this is the
price of a firm’s good multiplied by the marginal product of the
worker. Assume mvp
i
is a linear function of exogenous variables,
such as education, experience, and so on, and an unobservable
error. Under perfect competition and without institutional con-
straints, each worker is paid his or her marginal value product. Let
minwage
i
denote the minimum wage for worker i, which varies by
state. We observe wage
i
, which is the larger of mvp
i
and
minwage
i
. Write the appropriate model for the observed wage.
QUESTION 17.5