For example, we would expect subjects to be different in their baseline systolic blood
pressure measurements, family history of heart disease, weight, body mass, and other
characteristics. Because all of these factors may influence the length of the time interval
until a second myocardial infarction, we would like to account for these factors in deter-
mining the effectiveness of the medications. The regression method known as Cox regres-
sion (after D. R. Cox, who first proposed the method) or proportional hazard regression
can be used to account for the effects of continuous and discrete covariate (independent
variable) measurements when the dependent variable is possibly censored time-until-
event data.
We describe this technique by first introducing the hazard function, which describes
the conditional probability that an event will occur at a time just larger than condi-
tional on having survived event-free until time This conditional probability is also
known as the instantaneous failure rate at time and is often written as the function
The regression model requires that we assume the covariates have the effect of
either increasing or decreasing the hazard for a particular individual compared to some
baseline value for the function. In our clinical trial example we might measure k covari-
ates on each of the subjects where there are subjects and is the base-
line hazard function. We describe the regression model as
(12.8.3)
The regression coefficients represent the change in the hazard that results from
the risk factor, that we have measured. Rearranging the above equation shows that
the exponentiated coefficient represents the hazard ratio or the ratio of the conditional
probabilities of an event. This is the basis for naming this method proportional haz-
ards regression. You may recall that this is the same way we obtained the estimate of
the odds ratio from the estimated coefficient when we discussed logistic regression in
Chapter 11.
(12.8.4)
Estimating the covariate effects, requires the use of a statistical software package
because there is no straightforward single equation that will provide the estimates
for this regression model. Computer output usually includes estimates of the regres-
sion coefficients, standard error estimates, hazard ratio estimates, and confidence
intervals. In addition, computer output may also provide graphs of the hazard func-
tions and survival functions for subjects with different covariate values that are use-
ful to compare the effects of covariates on survival. In summary, Cox regression is a
useful technique for determining the effects of covariates with survival data. Addi-
tional information can be found in the texts by Kleinbaum (27), Lee (28), Kalbfleisch
and Prentice (30), Elandt-Johnson and Johnson (31), Cox and Oakes (32), and Fleming
and Harrington (33).
b
N
h1t
i
2
h
0
1t
i
2
= exp1b
1
z
i1
+ b
2
z
i2
+
Á
+ b
k
z
ik
2
z
ik
,
h1t
i
2= h
0
1t
i
2exp1b
1
z
i1
+ b
2
z
i2
+
Á
+ b
k
z
ik
2
h
0
1t
i
2I = 1,...,n
h1t
i
2.
t
i
t
i
.
t
i
660 CHAPTER 12 THE CHI-SQUARE DISTRIBUTION AND THE ANALYSIS OF FREQUENCIES