s-news
[Top] [All Lists]

step.gam() question: how to determine the proper scale argument?

To: <s-news@lists.biostat.wustl.edu>
Subject: step.gam() question: how to determine the proper scale argument?
From: <WillemW@lapp.nl>
Date: Tue, 17 Jan 2006 10:16:18 +0100
Thread-index: AcYbRqxJwVPdT6GkTjKzf5skjujxew==
Thread-topic: step.gam() question: how to determine the proper scale argument?
Dear all,

An established protocol for covariate analyses in population PK-PD modelling 
involves step.gam() as a pre-selection tool to limit the number of covariates 
that need to be included and tested in the actual PK-PD model. This tends to be 
done iteratively: step.gam() is used to test a large number of covariates 
against the posthoc estimates of the PK-PD model, those that are returned in 
the final step.gam model are included one by one in the PK-PD model, the 
covariate that results in the most significant improvement of the PK-PD model 
fit is retained, new posthocs are estimated and the procedure is repeated until 
no further covariates are returned by step.gam(). 

From a previous posting by Professor Ripley I understand that step.gam() tests
tAIC <- deviance(tfit) + 2 * (n - tfit$df.resid) * scale
i.e.  deviance + scale x estimated no of parameters (also referred to as 
'Hastie's AIC').

It follows that the penalty for adding an additional covariate to the model (or 
leaving one in) is adjusted by a scaling coefficient. According to the 
step.gam() help file, the default choice for this scale argument is the "scaled 
Chi-squared statistic for the initial model", however it is added that "if 
forward selection is to be performed, this is not necessarily a sound choice".
 
So what, then, is a sound choice when both forward and backward selection are 
to be performed?

In my experience, the default scaling parameter tends to decrease considerably 
with each iterative step in the protocol above, relaxing the penalty for 
additonal  parameters and resulting in new covariates allowed into the final 
model. Hence it seems that by default step.gam() tries very hard to include at 
least some covariates from the available pool. Visual inspection of these 
covariates against the random effects of the PK-PD model usually shows very 
little structure, and in the end they are usually discarded.
  
Is there a way to set the scale argument to step.gam() more judiciously? 

According to protocol, we have to test all covariates, so if we skip the 
step.gam() part we would have to iteratively include all covariates into the 
PK-PD model (involving NONMEM runs of many hours each for often 50+ 
covariates). Apart from the waste of time, I also have statistical reservations 
to such a 'model-mining' procedure, because with so many covariates it seems to 
me that one is bound to find 'significant' ones by pure chance alone (given an 
inclusion criterion of p<.95).

All advice is greatly appreciated.

Thanks,
Willem de Winter
w.dewinter@lapp.nl

<Prev in Thread] Current Thread [Next in Thread>
  • step.gam() question: how to determine the proper scale argument?, WillemW <=