I was ashamed to see what I had written - I need to check things more
carefully before hitting "Send". Please disregard the previous note.
Dear Hormuzd,
A few quick thoughts:
1. If you spend a lot of effort in optimizing the knot locations or the number
of knots,
it becomes difficult to get the correct confidence bands for the functional
form. Model
uncertainty adds to variance. This can be called the "phantom degrees of
freedom"
problem (see Ye, JASA 93:120, 1998)
2. Durleman and Simon showed that if you fix the number of knots in a restricted
cubic spline (natural spline), allowing the knot locations to be estimated
via
maximum likelihood estimation results in very little improvement in the log
likelihood,
i.e., the shapes are not that sensitive to the locations. Shape is
sensitive to # knots.
So I think you are prioritizing the correct aspect of the estimation
problem.
3. AIC is a pretty good and quick method for determining # knots, given you fix
the knot locations for each # of knots (I use a set of default percentiles
for this).
But refer back to 1. The approach I've been taking lately is to instead
use subject
matter knowledge or rank correlation measures (between each predictor you
decide to force into the model, and the outcome) to decide how many d.f.
to spend
on each predictor, then to spend that. This results in confidence
intervals with the
claimed coverage probability. I use a generalization of Spearman's rho
that
allows for non-monotonic relationships (see the spearman2 function in the
new
version of the Hmisc library available from our web page).
-Frank Harrell
-------------------------------------------------------------------------------------------
Frank E Harrell Jr
Professor of Biostatistics and Statistics
Division of Biostatistics and Epidemiology
Department of Health Evaluation Sciences
University of Virginia School of Medicine
hesweb1.med.virginia.edu/biostat
>
>-----Original Message-----
>From: Katki, Hormuzd (NCI) <katkih@mail.nih.gov>
>To: 's-news' <s-news@wubios.wustl.edu>
>Cc: Rosenberg, Philip (NCI) <rosenbep@epndce.nci.nih.gov>
>Date: Wednesday, April 07, 1999 1:55 PM
>Subject: [S] Logistic Regression Model Selection in S-Plus
>
>
>>
>>Dear S-Plus users,
>>
>>The issue is how to select the optimal # of knots for covariates represented
>>as
>>cubic splines in a logistic regression. The goal is to determine the
>>relationship between probability of outcome and the covariates, rather than
>>use
>>the model to predict outcome of individual observations.
>>
>>1. What is the state-of-the-art method for this? We are thinking
>>cross-validation, but posts on the S-news mailing list between Roy Pardee,
>>Frank
>>Harrell and Brian Ripley last year suggest to me that the bootstrap may be
>>better:
>>http://www.biostat.wustl.edu/hyperlists/s-news/199803/msg00135.html
>>http://www.biostat.wustl.edu/hyperlists/s-news/199803/msg00140.html
>>http://www.biostat.wustl.edu/hyperlists/s-news/199803/msg00156.html
>>http://www.biostat.wustl.edu/hyperlists/s-news/199803/msg00163.html
>>
>>2. Are there any S-Plus functions that implement the state-of-the-art method?
>>For example, I have experimented with Frank Harrell's libraries, but his
>>concern
>>is validating models for the purpose of prediction and it's not clear to me
>>how
>>I can use his libraries to select the optimal # of knots (if it is clear,
>>please
>>inform me).
>>
>>Any help is greatly appreciated! Thank you for your time,
>>
>>Hormuzd Katki
>>Biostatistics Branch, Division of Cancer Epidemiology and Genetics
>>National Cancer Institute
>>6120 Executive Blvd. Room 8044 MSC 7244
>>Bethesda MD 20892-7244
>>301-594-7818 (voice)
>>301-402-0081 (fax)
>>katkih@mail.nih.gov
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news
|