s-news
[Top] [All Lists]

Re: [S] Logistic Regression Model Selection in S-Plus

To: s-news <s-news@wubios.wustl.edu>
Subject: Re: [S] Logistic Regression Model Selection in S-Plus
From: Frank E Harrell Jr <fharrell@virginia.edu>
Date: Wed, 7 Apr 1999 14:10:51 -0400
Sender: owner-s-news@wubios.wustl.edu
Dear Hormuzd,

A few of quick thoughts:

1. If you spend a lot of effort in optimizing the knot locations, it becomes
    difficult to get the correct confidence bands for the functional form.  
Model
    uncertainty adds to variance.

2. Durleman and Simon showed that if you fix the number of knots in a restricted
    cubic spline (natural spline), allowing the knot locations to be estimated 
via
    maximum likelihood estimation results in very little improvement in the log 
likelihood,
    i.e., the shapes are not that sensitive to the locations.  Shape is 
sensitive to # knots.

3. AIC is a pretty good and quick method for determining # knots, given you fix
     the knot locations for each # of knots (I use a set of default percentiles 
for this).
     But refer back to 1.  The approach I've been taking lately is to use 
subject
     matter or rank correlation measures to decide how many d.f. to spend on 
each
     predictor, then to spend that.  This results in confidence intervals with 
the claimed
     coverage probability.

-Frank Harrell


-----Original Message-----
From: Katki, Hormuzd (NCI) <katkih@mail.nih.gov>
To: 's-news' <s-news@wubios.wustl.edu>
Cc: Rosenberg, Philip (NCI) <rosenbep@epndce.nci.nih.gov>
Date: Wednesday, April 07, 1999 1:55 PM
Subject: [S] Logistic Regression Model Selection in S-Plus


>
>Dear S-Plus users,
>
>The issue is how to select the optimal # of knots for covariates represented as
>cubic splines in a logistic regression.  The goal is to determine the
>relationship between probability of outcome and the covariates, rather than use
>the model to predict outcome of individual observations.
>
>1.  What is the state-of-the-art method for this?  We are thinking
>cross-validation, but posts on the S-news mailing list between Roy Pardee, 
>Frank
>Harrell and Brian Ripley last year suggest to me that the bootstrap may be
>better:
>http://www.biostat.wustl.edu/hyperlists/s-news/199803/msg00135.html
>http://www.biostat.wustl.edu/hyperlists/s-news/199803/msg00140.html
>http://www.biostat.wustl.edu/hyperlists/s-news/199803/msg00156.html
>http://www.biostat.wustl.edu/hyperlists/s-news/199803/msg00163.html
>
>2.  Are there any S-Plus functions that implement the state-of-the-art method?
>For example, I have experimented with Frank Harrell's libraries, but his 
>concern
>is validating models for the purpose of prediction and it's not clear to me how
>I can use his libraries to select the optimal # of knots (if it is clear, 
>please
>inform me).  
>
>Any help is greatly appreciated!  Thank you for your time,
>
>Hormuzd Katki
>Biostatistics Branch, Division of Cancer Epidemiology and Genetics
>National Cancer Institute
>6120 Executive Blvd. Room 8044 MSC 7244
>Bethesda MD 20892-7244
>301-594-7818 (voice)
>301-402-0081 (fax)
>katkih@mail.nih.gov
>
>-----------------------------------------------------------------------
>This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
>send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
>message:  unsubscribe s-news

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>