s-news
[Top] [All Lists]

Bug in kaplanMeier

To: macq@llnl.gov
Subject: Bug in kaplanMeier
From: Terry Therneau <therneau@mayo.edu>
Date: Fri, 17 Nov 2000 18:43:55 -0600 (CST)
Cc: s-news@wubios.wustl.edu
Don McQueen asks for some clarification on the survfit() function, in
particular disagreements with the results of kaplanMeier().  As the
author of survfit, such notices catch my eye ....

1. By reading the documentation (help(survfit) and help(summary.survfit)),
we see that
        survfit returns the standard error of the cumulative hazard function
        summary.survfit prints out the survival and standard error of the
                survival

And a basic survival analysis textbook will show that
  survival = S(t)   = exp(-cumulative hazard)
  se(survival)      = se(cumulative hazard)* survival


2. There are several ways to compute confidence bands for a survival curve.
The literature is unclear about which method is best, but quite clear that
the "plain" intervals of S(t) +- 1.96 se(survival) are a bad choice.

  On log scale, the lower confidence value is

        exp(log(survival) - 1.96 * se(cum haz))

We can check this using the leukemia data set, which is a part of Splus.
(All the below is version 6.0 on a Sun workstation)

> fit1 <- survfit(Surv(time, status) ~1, leukemia)
> fit1$surv
 [1] 0.91304348 0.82608696 0.78260870 0.73913043 0.69565217 etc

> fit1$std.err
 [1] 0.06434895 0.09567297 0.10989675 0.12387602 0.13791932 etc

> exp(log(0.91304348) - 1.96*0.06434895)
[1] 0.8048529

> summary(fit1)
 time n.risk n.event survival std.err lower 95% CI upper 95% CI 
    5     23       2   0.9130  0.0588       0.8049        1.000
    8     21       2   0.8261  0.0790       0.6848        0.996
    9     19       1   0.7826  0.0860       0.6310        0.971
        etc

 This looks good to me.


3) The kaplanMeier function for this data set returns

> kaplanMeier(censor(time, status) ~1, leukemia)
Number Observed: 23 
Number Censored: 5 
Confidence Type: log 
            Survival Std.Err 95% LCL 95% UCL 
(-Inf,   5]    1.000   0.000   1.000   1.000
(   5,   8]    0.913   0.059   0.814   1.000
(   8,   9]    0.826   0.079   0.708   0.964
        etc

  I didn't write this routine, so I can only guess at what computation might
have generated 0.814.  The incorrect calculation
         exp(log(0.91304348) - 1.96*0.06434895 *0.91304348)
appears to be what was done.  But it could also be due to a confidence
level of .0735 instead of .05 --- the definitive answer on this will have
to come from StatSci.

        Terry Therneau



PS  The routines don't agree for naive intervals either, for which the
proper result is  0.9130 - 1.96*.0588

> summary(survfit(Surv(time, status) ~1, leukemia, conf.type='plain'))

 time n.risk n.event survival std.err lower 95% CI upper 95% CI 
    5     23       2   0.9130  0.0588       0.7979        1.000
    8     21       2   0.8261  0.0790       0.6712        0.981
                etc.
> kaplanMeier(censor(time, status) ~1, leukemia, conf.interval='identity')

Number Observed: 23 
Number Censored: 5 
Confidence Type: identity 
            Survival Std.Err 95% LCL 95% UCL 
(-Inf,   5]    1.000   0.000   1.000   1.000
(   5,   8]    0.913   0.059   0.808   1.000
(   8,   9]    0.826   0.079   0.698   0.954
        etc


        

<Prev in Thread] Current Thread [Next in Thread>
  • Bug in kaplanMeier, Terry Therneau <=