s-news
[Top] [All Lists]

[S] kaplanMeier vs. survfit in S+2000

To: S-news@wubios.wustl.edu
Subject: [S] kaplanMeier vs. survfit in S+2000
From: "Molinari, Luciano" <Luciano.Molinari@kispi.unizh.ch>
Date: Mon, 29 May 2000 09:17:26 +0200
Sender: owner-s-news@wubios.wustl.edu
I claim there is a bug in kaplanMeier. I tried to get in touch with MathSoft
in this respect, with no success, therefore I am sending the message to the
list asking for comments.
L. Molinari
================================
I have come across a questionable piece of code in the function kaplanMeier.
(S+2000 Rel2 under NT4.0)

But first compare the results of survfit and kaplanMeier:
-----------Call to survfit page 233 in S+ Guide to Statistics vol2 
S+> summary(survfit(Surv(time,status) ~ group,leukemia))
Call: survfit(formula = Surv(time, status) ~ group, data = leukemia)

                group=Maintained 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI 
    9     11       1    0.909  0.0867       0.7541        1.000
   13     10       1    0.818  0.1163       0.6192        1.000
   18      8       1    0.716  0.1397       0.4884        1.000
   23      7       1    0.614  0.1526       0.3769        0.999
   31      5       1    0.491  0.1642       0.2549        0.946
   34      4       1    0.368  0.1627       0.1549        0.875
   48      2       1    0.184  0.1535       0.0359        0.944

                group=Nonmaintained 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI 
    5     12       2   0.8333  0.1076       0.6470        1.000
    8     10       2   0.6667  0.1361       0.4468        0.995
   12      8       1   0.5833  0.1423       0.3616        0.941
   23      6       1   0.4861  0.1481       0.2675        0.883
   27      5       1   0.3889  0.1470       0.1854        0.816
   30      4       1   0.2917  0.1387       0.1148        0.741
   33      3       1   0.1944  0.1219       0.0569        0.664
   43      2       1   0.0972  0.0919       0.0153        0.620
   45      1       1   0.0000      NA           NA           NA
---------------- Then call to kaplanMeier --------------------
S+> kaplanMeier(censor(time,status) ~ group,leukemia)
group=Maintained 
Number Observed: 11 
Number Censored: 4 
Confidence Type: log 
            Survival Std.Err 95% LCL 95% UCL 
(-Inf,   9]    1.000   0.000   1.000   1.000
(   9,  13]    0.909   0.087   0.767   1.000
(  13,  18]    0.818   0.116   0.651   1.000
(  18,  23]    0.716   0.140   0.544   0.941
(  23,  31]    0.614   0.153   0.455   0.828
(  31,  34]    0.491   0.164   0.356   0.677
(  34,  48]    0.368   0.163   0.268   0.506
(  48, 161]    0.184   0.153   0.136   0.249

group=Nonmaintained 
Number Observed: 12 
Number Censored: 1 
Confidence Type: log 
            Survival Std.Err 95% LCL 95% UCL 
(-Inf,   5]    1.000   0.000   1.000   1.000
(   5,   8]    0.833   0.108   0.675   1.000
(   8,  12]    0.667   0.136   0.511   0.870
(  12,  23]    0.583   0.142   0.441   0.771
(  23,  27]    0.486   0.148   0.364   0.650
(  27,  30]    0.389   0.147   0.292   0.519
(  30,  33]    0.292   0.139   0.222   0.383
(  33,  43]    0.194   0.122   0.153   0.247
(  43,  45]    0.097   0.092   0.081   0.116
(  45, Inf)    0.000   0.000      NA      NA
-----------------------------------------------------------
The survival function AND the standard errors are identical, but LCL and UCL
are quite different, and much narrower for kaplanMeier, and this is true for
the other conf.type options (ident or plain and log-log) also.
The source of the difference is in the following piece of code in
kaplanMeier, which does not correspond to page 238-239 of the S+ Guide to
Statistics, Vol.2. 
Here the code of kaplanMeier:
----------------------------------
                                if(conf.interval == "identity") {
                                  temp1 <- surv + zval * se * surv
                                  temp2 <- surv - zval * se * surv
                                  fiti <- cbind(fiti, lower = pmax(temp2,
0), upper = pmin(
                                    temp1, 1))
                                }
                                if(conf.interval == "log") {
                                  xx <- ifelse(surv == 0, 1, surv)      
        #avoid some "log(0)" messages
                                  temp1 <- ifelse(surv == 0, NA, exp(log(xx)
+ zval * se))
                                  temp2 <- ifelse(surv == 0, NA, exp(log(xx)
- zval * se))
                                  fiti <- cbind(fiti, lower = temp2, upper =
pmin(temp1, 1))
                                }
                                if(conf.interval == "log-log") {
                                  who <- (surv == 0 | surv == 1)
#special cases
                                  temp3 <- ifelse(surv == 0, NA, 1)
                                  xx <- ifelse(who, 0.1, surv)  #avoid some
"log(0)" messages
                                  temp1 <- exp( - exp(log( - log(xx)) +
(zval * se)/log(xx)))
                                  temp1 <- ifelse(who, temp3, temp1)
                                  temp2 <- exp( - exp(log( - log(xx)) -
(zval * se)/log(xx)))
                                  temp2 <- ifelse(who, temp3, temp2)
                                  fiti <- cbind(fiti, lower = temp2, upper =
temp1)
                                }
-------------------------------
For example, the formula at the bottom of page 238, would imply the
following code
               temp1 <- surv + zval * se
rather then the one given above.
Either the formulas of page 238-239 are incorrect or the above code.

Notice that for the survfit function the confidence interval is calculated
in survfit.km, but the code is only apparently identical with the above
code, because the variable std.err is different! In fact the final std.err
is calculated in summary.survfit by multiplying it by the survival function.
All this is not very transparent and easily leads to misunderstandings.

It is easy to modify kaplanMeier to give the same results as survfit, but
before I do that I would like to have your opinion.
Best regards,
Luciano Molinari
-----------------------
Dr. Luciano Molinari
Abt. EEG und AWE
Kinderspital Zürich
Steinwiesstr.75
CH-8032 Zürich
luciano.molinari@kispi.unizh.ch
Tel. (00411) 266 79 25 oder (00411) 266 75 66
Fax (00411) 266 71 71

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>
  • [S] kaplanMeier vs. survfit in S+2000, Molinari, Luciano <=