| To: | s-news@lists.biostat.wustl.edu |
|---|---|
| Subject: | Cox model, df, and frailty term |
| From: | "José M. Fedriani" <fedriani@ebd.csic.es> |
| Date: | Mon, 16 Jun 2008 19:49:54 +0200 |
|
Dear list, According to a message from Terry Therneau to this list (see below), I am choosing among different Cox models. In so doing, I am following the "integrated likelihood view". The full model contains a frailty term, two categorical factors (of two levels each one) and their interaction. For example, two of the compared models are: > fit1<-coxph(Surv(time,censor)~factorA + factorB + factorA*factorB + frailty(block) > fit2<-coxph(Surv(time,censor)~factorA + factorB + frailty(block) Under the "integrated likelihood view" the two models should be compared based on the integrated likelihood ("EML") in the printed out of each model. Because in the above example EML1= 5937 and EML2= 5939, then 2*(5939 - 5737) = 4 ; which, when compared with a chisquare on 1df, indicate that the interaction factorA*factorB would be significant. My question is, should I use always one degree of freedom or this depends on the number of levels of the interacting factors? Thank you very much for your help! Jose MESSAGE FROM TERRY M. THERNEAU : Ramon Diaz-Uriarte asks a hard question about nested models containing a frailty term. Assume that factorA is binary. > fit1<-coxph(Surv(time,censor)~other.stuff+factorA+ frailty(id)) > fit2<-coxph(Surv(time,censor)~other.stuff+ frailty(id)) > chi1 <- 2*diff(fit1$loglik) > chi2 <- 2*diff(fit2$loglik) > 1-pchisq(chi1-chi2, df=sum(fit1$df)-sum(fit2$df)) The above can give a negative chisquare and/or degrees of freedom, which is certainly disturbing. There are two ways to look at the frailty term: as a single extra variance parameter (integrated likelihood view), or as a large collection of coefficients subject to a shrinkage constraint (penalized likelihood view). 1. If we take the first view, then the two models should be compared based on the integrated likelihood, the term currently labeled "EM likelihood" on the printout. This has integrated out the individual frailty coefficients. The test comparing the two models will have 1 degree of freedom. In the example below, this will give 2*(181.7-179.1)= 5.1 on 1 df: disease is a significant covariate. a. Pro: the theory is worked out, & seems reliable b. Con: theory is only present for a single term, and that term must be a factor variable. (At least for the distributions supported in Splus). 2. The second view treats id as a factor variable, with a penalty term to keep the number of degrees of freedom `sensible'. This is the way that smoothing spline (pspline) inference is done. But if this approach is taken, the above code may not be valid because the frailty term has not necessarily retained the same number of degrees of freedom for the two runs. For example: > coxph(Surv(time, status) ~ sex + disease + frailty(id), data=""> coef se(coef) se2 Chisq DF p sex -1.477 0.357 0.357 17.14 1 3.5e-05 diseaseGN 0.139 0.364 0.364 0.15 1 7.0e-01 diseaseAN 0.413 0.336 0.336 1.51 1 2.2e-01 diseasePKD -1.367 0.589 0.589 5.39 1 2.0e-02 frailty(id) 0.00 0 9.3e-01 Iterations: 6 outer, 28 Newton-Raphson Variance of random effect= 5e-07 EM likelihood = -179.1 Degrees of freedom for terms= 1 3 0 Likelihood ratio test=17.6 on 4 df, p=0.0015 n= 76 > coxph(Surv(time, status) ~ sex + frailty(id), data=""> coef se(coef) se2 Chisq DF p sex -1.56 0.454 0.351 11.8 1.0 0.00058 frailty(id) 23.1 13.2 0.04400 Iterations: 6 outer, 36 Newton-Raphson Variance of random effect= 0.399 EM likelihood = -181.7 Degrees of freedom for terms= 0.6 13.2 Likelihood ratio test=45.8 on 13.78 df, p=2.63e-05 n= 76 The second model has dropped one term, but the degrees of freedom has actually gone UP, due to a different frailty solution. To get an "honest" test of the disease term, compare the second model instead to > coxph(Surv(time, status) ~ sex + disease + frailty(id, theta=.399), data=""> coef se(coef) se2 Chisq DF p sex -1.731 0.465 0.369 13.86 1.0 0.0002 diseaseGN 0.268 0.496 0.369 0.29 1.0 0.5900 diseaseAN 0.493 0.451 0.343 1.20 1.0 0.2700 diseasePKD -0.788 0.812 0.620 0.94 1.0 0.3300 frailty(id, theta = 0.399 15.86 11.8 0.1900 Iterations: 1 outer, 7 Newton-Raphson Variance of random effect= 0.399 EM likelihood = -180.1 Degrees of freedom for terms= 0.6 1.7 11.8 Likelihood ratio test=46.7 on 14.22 df, p=2.5e-05 n= 76 Now the test for addition of disease is (46.7 - 45.8)=0.9 on (14.22 - 13.78) = 0.44 df. (The kidney data set is an extreme case, since both the frailty and disease effects are dominated by a single large outlier. Rather innocent looking changes in the model can lead to very different solutions.) a. Pro: The approach works for multiple frailty terms, and for all frailty distributions. Related to the Wald test(s) beta/se(beta) Integrates nicely with AIC and BIC selection criteria b. Con: The asymptotic framework is shaky. The number of "parameters" increases with n. --- The jury is still out on which outlook is the better one. In fact, I'm not all that sure that the jury has begun deliberations. But I am very interested in the answer. ******************************************** Jose M. Fedriani Estacion Biologica de Donana (CSIC) Avda. Mª Luisa s/n Sevilla 41013 Spain Telfno: 34+954232340 Fax: 34+954621125 www2.ebd.csic.es/fedriani |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | timeDate difference help, Michael Slattery |
|---|---|
| Next by Date: | Re: Passing arguments to a script file called in batch mode (fwd), Glenn Treacy |
| Previous by Thread: | timeDate difference help, Michael Slattery |
| Next by Thread: | Re: Passing arguments to a script file called in batch mode (fwd), Glenn Treacy |
| Indexes: | [Date] [Thread] [Top] [All Lists] |