Part 3 Advanced Topics
464
security number almost certainly satisfies the first requirement: it is uncorrelated with
ability because it is determined randomly. However, this variable is not correlated with
education, so it makes a poor instrumental variable for educ.
What we have called a proxy variable for the omitted variable makes a poor IV for
the opposite reason. For example, in the log(wage) example with omitted ability, a
proxy variable for abil must be as highly correlated as possible with abil. An instru-
mental variable must be uncorrelated with abil. Therefore, while IQ is a good candi-
date as a proxy variable for abil, it is not a good instrumental variable for educ.
The requirements are less clear-cut for other possible instrumental variable candi-
dates. In wage equations, labor economists have used family background variables as
IVs for education. For example, mother’s education (motheduc) is positively correlated
with child’s education, as can be seen by collecting a sample of data on working peo-
ple and running a simple regression of educ on motheduc. Therefore, motheduc satis-
fies equation (15.5). The problem is that mother’s education might also be correlated
with child’s ability (through mother’s ability and perhaps quality of nurturing at an
early age).
Another IV choice for educ in (15.1) is number of siblings while growing up (sibs).
Typically, more siblings is associated with lower average levels of education. Thus, if
number of siblings is uncorrelated with ability, it can act as an instrumental variable for
educ.
As a second example, consider the problem of estimating the causal effect of skip-
ping classes on final exam score. In a simple regression framework, we have
score
0
1
skipped u, (15.8)
where score is the final exam score, and skipped is the total number of lectures missed
during the semester. We certainly might be worried that skipped is correlated with other
factors in u: better students might miss fewer classes. Thus, a simple regression of score
on skipped may not give us a good estimate of the causal effect of missing classes.
What might be a good IV for skipped? We need something that has no direct effect
on score and is not correlated with student ability. At the same time, the IV must be cor-
related with skipped. One option is to use distance between living quarters and campus.
Some students at a large university will commute to campus, which may increase the
likelihood of missing lectures (due to bad weather, oversleeping, and so on). Thus,
skipped may be positively correlated with distance; this can be checked by regressing
skipped on distance and doing a t test, as described earlier.
Is distance uncorrelated with u? In the simple regression model (15.8), some factors
in u may be correlated with distance. For example, students from low-income families
may live off campus; if income affects student performance, this could cause distance
to be correlated with u. Section 15.2 shows how to use IV in the context of multiple
regression, so that other factors affecting score can be included directly in the model.
Then, distance might be a good IV for skipped. An IV approach may not be necessary
at all if a good proxy exists for student ability, such as cumulative GPA prior to the
semester.
We now demonstrate that the availability of an instrumental variable can be used to
consistently estimate the parameters in equation (15.2). In particular, we show that
d 7/14/99 7:43 PM Page 464