s-news
[Top] [All Lists]

Cox model, df, and frailty term

To: s-news@lists.biostat.wustl.edu
Subject: Cox model, df, and frailty term
From: "José M. Fedriani" <fedriani@ebd.csic.es>
Date: Mon, 16 Jun 2008 19:49:54 +0200

Dear list,

        According to a message from Terry Therneau to this list (see below), I am choosing among different Cox models.  In so doing, I am following the "integrated likelihood view". The full model contains a frailty term, two categorical factors (of two levels each one) and their interaction.   For example, two of the compared models are:

> fit1<-coxph(Surv(time,censor)~factorA + factorB + factorA*factorB + frailty(block)

> fit2<-coxph(Surv(time,censor)~factorA + factorB +                           frailty(block)

       
Under the "integrated likelihood view" the two models should be compared based
on the integrated likelihood ("EML") in the printed out of each model. 

Because in the above example EML1= 5937 and EML2= 5939, then

 2*(5939 - 5737) = 4 ; which, when compared with a chisquare on 1df, indicate that the interaction factorA*factorB would be significant.

My question is, should I use always one degree of freedom or this depends on the number of levels of the interacting factors?

Thank you very much for your help!

 Jose





MESSAGE FROM TERRY M. THERNEAU :


  Ramon Diaz-Uriarte asks a hard question about nested models containing a
frailty term.  Assume that factorA is binary.


> fit1<-coxph(Surv(time,censor)~other.stuff+factorA+ frailty(id))
> fit2<-coxph(Surv(time,censor)~other.stuff+                frailty(id))

> chi1 <- 2*diff(fit1$loglik)
> chi2 <- 2*diff(fit2$loglik)

> 1-pchisq(chi1-chi2, df=sum(fit1$df)-sum(fit2$df))

  The above can give a negative chisquare and/or degrees of freedom, which
is certainly disturbing.


   There are two ways to look at the frailty term: as a single extra
variance parameter (integrated likelihood view), or as a large
collection of coefficients subject to a shrinkage constraint
(penalized likelihood view).

 1. If we take the first view, then the two models should be compared based
on the integrated likelihood, the term currently labeled "EM likelihood" on
the printout.  This has integrated out the individual frailty coefficients.
The test comparing the two models will have 1 degree of freedom.  In the
example below, this will give 2*(181.7-179.1)= 5.1 on 1 df: disease is
a significant covariate.

   a. Pro: the theory is worked out, & seems reliable
   b. Con: theory is only present for a single term, and that
term must be a factor variable.  (At least for the distributions supported
in Splus). 

 2. The second view treats id as a factor variable, with a penalty term
to keep the number of degrees of freedom `sensible'. This is the way
that smoothing spline (pspline) inference is done.  But if this approach
is taken, the above code may not be valid because the frailty term has
not necessarily retained the same number of degrees of freedom for the
two runs.  For example:
  > coxph(Surv(time, status) ~ sex + disease + frailty(id),  data="">                  coef se(coef)   se2 Chisq DF       p
           sex -1.477 0.357    0.357 17.14 1  3.5e-05
     diseaseGN  0.139 0.364    0.364  0.15 1  7.0e-01
     diseaseAN  0.413 0.336    0.336  1.51 1  2.2e-01
    diseasePKD -1.367 0.589    0.589  5.39 1  2.0e-02
   frailty(id)                        0.00 0  9.3e-01
  
   Iterations: 6 outer, 28 Newton-Raphson
        Variance of random effect= 5e-07   EM likelihood = -179.1
   Degrees of freedom for terms= 1 3 0
   Likelihood ratio test=17.6  on 4 df, p=0.0015  n= 76


  > coxph(Surv(time, status) ~ sex           + frailty(id),  data="">                 coef se(coef)   se2 Chisq   DF       p
           sex -1.56 0.454    0.351 11.8   1.0 0.00058
   frailty(id)                      23.1  13.2 0.04400
  
   Iterations: 6 outer, 36 Newton-Raphson
        Variance of random effect= 0.399   EM likelihood = -181.7
   Degrees of freedom for terms=  0.6 13.2
   Likelihood ratio test=45.8  on 13.78 df, p=2.63e-05  n= 76

The second model has dropped one term, but the degrees of freedom has
actually gone UP, due to a different frailty solution.  To get an "honest"
test of the disease term, compare the second model instead to

   > coxph(Surv(time, status) ~ sex + disease + frailty(id, theta=.399),
                           data="">                                coef se(coef)   se2 Chisq   DF      p
                         sex -1.731 0.465    0.369 13.86  1.0 0.0002
                   diseaseGN  0.268 0.496    0.369  0.29  1.0 0.5900
                   diseaseAN  0.493 0.451    0.343  1.20  1.0 0.2700
                  diseasePKD -0.788 0.812    0.620  0.94  1.0 0.3300
   frailty(id, theta = 0.399                       15.86 11.8 0.1900
  
   Iterations: 1 outer, 7 Newton-Raphson
        Variance of random effect= 0.399   EM likelihood = -180.1
   Degrees of freedom for terms=  0.6  1.7 11.8
   Likelihood ratio test=46.7  on 14.22 df, p=2.5e-05  n= 76

Now the test for addition of disease is (46.7 - 45.8)=0.9 on (14.22 -
13.78) = 0.44 df.  (The kidney data set is an extreme case, since both
the frailty and disease effects are dominated by a single large
outlier.  Rather innocent looking changes in the model can lead to
very different solutions.)

  a. Pro: The approach works for multiple frailty terms, and for all
frailty distributions.
          Related to the Wald test(s) beta/se(beta)
          Integrates nicely with AIC and BIC selection criteria
  b. Con: The asymptotic framework is shaky.  The number of "parameters"
increases with n.

---
  The jury is still out on which outlook is the better one.  In fact, I'm
not all that sure that the jury has begun deliberations.  But I am very
interested in the answer.



 
 ********************************************
Jose M. Fedriani
Estacion Biologica de Donana (CSIC)
Avda. Mª Luisa s/n
Sevilla 41013
Spain
Telfno: 34+954232340
Fax: 34+954621125
www2.ebd.csic.es/fedriani
<Prev in Thread] Current Thread [Next in Thread>
  • Cox model, df, and frailty term, "José M. Fedriani" <=