require much more detailed analysis than can be covered in this text. For those interested
in further reading, we suggest the books by Lee (28) or Hosmer and Lemeshow (34).
Typically, for purposes of analysis, a dichotomous, or indicator, variable is used to
distinguish survival times of those patients who experienced the event of interest from
the censored times of those who did not experience the event of interest because of loss
to follow-up or being alive at the termination of the study.
In studies involving the comparison of two treatments, we are interested in three items
of information for each patient: (1) Which treatment, A or B, was given to the patient?
(2) For what length of time was the patient observed? (3) Did the patient experience the event
of interest during the study or was he or she either lost to follow-up or alive at the end of
the study? (That is, is the observed time an event time or a censored time?) In studies that
are not concerned with the comparison of treatments or other characteristics of patients, only
the last two items of data are relevant.
Armed with these three items of information, we are able, in studies like our
myocardial infarction example, to estimate the median survival time of the group of
patients who received treatment A and compare it with the estimated median survival time
of the group receiving treatment B. Comparison of the two medians allows us to answer
the following question: Based on the information from our study, which treatment do we
conclude delays for a longer period of time, on the average, the occurrence of the event
of interest? In the case of our example, we may answer the question: Which treatment do
we conclude delays for a longer period of time, on the average, the occurrence of a sec-
ond myocardial infarction? The data collected in follow-up studies such as we have
described may also be used to answer another question of considerable interest to the cli-
nician: What is the estimated probability that a patient will survive for a specified length
of time? The clinician involved in our myocardial infarction study, for example, might
ask, “What is the estimated probability that, following a first heart attack, a patient receiv-
ing treatment A will survive for more than three years?” The methods employed to answer
these questions by using the information collected during a follow-up study are known as
survival analysis methods.
The Kaplan–Meier Procedure Now let us show how we may use the data
usually collected in follow-up studies of the type we have been discussing to estimate
the probability of surviving for a specified length of time. The method we use was intro-
duced by Kaplan and Meier (24) and for that reason is called the Kaplan–Meier proce-
dure. Since the procedure involves the successive multiplication of individual estimated
probabilities, it is sometimes referred to as the product-limit method of estimating sur-
vival probabilities.
As we shall see, the calculations include the computations of proportions of sub-
jects in a sample who survive for various lengths of time. We use these sample propor-
tions as estimates of the probabilities of survival that we would expect to observe in the
population represented by our sample. In mathematical terms we refer to the process as
the estimation of a survivorship function. Frequency distributions and probability distri-
butions may be constructed from observed survival times, and these observed distribu-
tions may show evidence of following some theoretical distribution of known functional
form. When the form of the sampled distribution is unknown, it is recommended that
the estimation of a survivorship function be accomplished by means of a nonparametric
650
CHAPTER 12 THE CHI-SQUARE DISTRIBUTION AND THE ANALYSIS OF FREQUENCIES