Don McQueen asks for some clarification on the survfit() function, in
particular disagreements with the results of kaplanMeier(). As the
author of survfit, such notices catch my eye ....
1. By reading the documentation (help(survfit) and help(summary.survfit)),
we see that
survfit returns the standard error of the cumulative hazard function
summary.survfit prints out the survival and standard error of the
survival
And a basic survival analysis textbook will show that
survival = S(t) = exp(-cumulative hazard)
se(survival) = se(cumulative hazard)* survival
2. There are several ways to compute confidence bands for a survival curve.
The literature is unclear about which method is best, but quite clear that
the "plain" intervals of S(t) +- 1.96 se(survival) are a bad choice.
On log scale, the lower confidence value is
exp(log(survival) - 1.96 * se(cum haz))
We can check this using the leukemia data set, which is a part of Splus.
(All the below is version 6.0 on a Sun workstation)
> fit1 <- survfit(Surv(time, status) ~1, leukemia)
> fit1$surv
[1] 0.91304348 0.82608696 0.78260870 0.73913043 0.69565217 etc
> fit1$std.err
[1] 0.06434895 0.09567297 0.10989675 0.12387602 0.13791932 etc
> exp(log(0.91304348) - 1.96*0.06434895)
[1] 0.8048529
> summary(fit1)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
5 23 2 0.9130 0.0588 0.8049 1.000
8 21 2 0.8261 0.0790 0.6848 0.996
9 19 1 0.7826 0.0860 0.6310 0.971
etc
This looks good to me.
3) The kaplanMeier function for this data set returns
> kaplanMeier(censor(time, status) ~1, leukemia)
Number Observed: 23
Number Censored: 5
Confidence Type: log
Survival Std.Err 95% LCL 95% UCL
(-Inf, 5] 1.000 0.000 1.000 1.000
( 5, 8] 0.913 0.059 0.814 1.000
( 8, 9] 0.826 0.079 0.708 0.964
etc
I didn't write this routine, so I can only guess at what computation might
have generated 0.814. The incorrect calculation
exp(log(0.91304348) - 1.96*0.06434895 *0.91304348)
appears to be what was done. But it could also be due to a confidence
level of .0735 instead of .05 --- the definitive answer on this will have
to come from StatSci.
Terry Therneau
PS The routines don't agree for naive intervals either, for which the
proper result is 0.9130 - 1.96*.0588
> summary(survfit(Surv(time, status) ~1, leukemia, conf.type='plain'))
time n.risk n.event survival std.err lower 95% CI upper 95% CI
5 23 2 0.9130 0.0588 0.7979 1.000
8 21 2 0.8261 0.0790 0.6712 0.981
etc.
> kaplanMeier(censor(time, status) ~1, leukemia, conf.interval='identity')
Number Observed: 23
Number Censored: 5
Confidence Type: identity
Survival Std.Err 95% LCL 95% UCL
(-Inf, 5] 1.000 0.000 1.000 1.000
( 5, 8] 0.913 0.059 0.808 1.000
( 8, 9] 0.826 0.079 0.698 0.954
etc
|