A bigger issue is that stepwise regression distorts F statistics,
whether using forward or backward stepwise regression.
An F-statistic to compare nested models has approximately an F
distribution only when the two-models were specified a-priori, not
when one of the models is chosen because it appears best (or worst,
in the case of backward stepwise regression).
As an alternative to stepwise regression, consider least angle
regression / lasso / L1-regularized regression (different ways of
saying the same thing). See the 'glars' package at
http://csan.insightful.com, based on the 'glmpath' package by
Mee Young Park & Trevor Hastie. For background, see
Efron et al, Least Angle Regression, Annals 2004, 32(2), 407-451.
Regarding the reviewer's point - you could construct examples where
starting with the full model is better. If n is much larger than p
I would do so (except that I'd use lasso instead).
Tim Hesterberg
>I used the following table to show the significant terms in a GLM model in my
>paper. One reviewer made the following comment:
>"Look at the folumn of DF it looks like the model was fitted useing step-wise
>regression. This distorts the F-statistics since these woudl be F-to-ennter. A
>more appropriate F would be to start with the full model and calculate F to
>delete one at a time. This would mean the error residuals are always estimated
>using the full model".
>
>I did this analysis quite some time ago and lost all the codes due to my
>relacation. My question is when fitting GLMs, can we choose to start with a
>null model or a full model? Does this really make differences? Thank you so
>much for your help.
>
>Regards,
>
>Yiwu
|