s-news
[Top] [All Lists]

Re: comparing null deviance to residual deviance in step.glm

To: Megan Ferguson <Megan.Ferguson@noaa.gov>
Subject: Re: comparing null deviance to residual deviance in step.glm
From: Prof Brian Ripley <ripley@stats.ox.ac.uk>
Date: Tue, 4 Oct 2005 20:00:02 +0100 (BST)
Cc: s-news@lists.biostat.wustl.edu
In-reply-to: <434160CF.5060502@noaa.gov>
References: <434160CF.5060502@noaa.gov>
I am afraid this is a tale of woe. For some reason Insightful continues to supply a function, step.glm, with show-stopping bugs. (I looked at the code in S-PLUS 7, not having 6.1 to hand.)


1) I think the crux here is the offset. First, step.glm returns the results of glm.fit, whereas glm recomputes the null deviance for models with offsets, since glm.fit gets it wrong. So step.glm is wrong.

Second, step.glm calls add1.lm and drop1.lm which do not know about offsets, and so its approximations to the AIC changes are in a completely incorrect model.

This has been known for many years.


2) You have a conceptual problem here. step.glm claims to minimize AIC. For AIC to be defined you need a log-likelihood, and the point about quasi-likelihood fits is that you do not have a likelihood. Your quasi-likelihood is that of a quasi-Poisson model with a lot of over-dispersion, but does not correspond to the likelihood of any over-dispersed Poisson model. If your data are counts I would consider using a negative-binomial GLM.

Model selection in quasi-likelihood GLMs is AFAIK a research topic.


3) It is well known that step.glm does not do what it claims, both because it uses a too-crude approximation and because it has a different definition of AIC from Akaike (and almost everyone else). That's the reason behind stepAIC in the MASS library section.



On Mon, 3 Oct 2005, Megan Ferguson wrote:

Hi,

I'm using S-PLUS Professional Edition Version 6.1.2 Release 1 for Windows to build a quasi-likelihood glm with a log link and variance proportional to the mean. I've been using the "glm" command to build a null model, which I then feed to "step.glm" to test a range of variables for inclusion in the model. My code is as follows:

####
null.indglm.stripe10 <- glm(formula = indstripe ~offset(log(oneffort20)),
family = quasi(link = log, variance = "mu"), data = all.10.ind13)

indglm1.stripe10 <- step.glm(null.indglm.stripe10, direction="both", scope=~
frontsd2
+ distarc + distarc2
+ depth + depth2
+ pslope + pslope2
+ temp + temp2
+ sal + sal2
+ tcd + tcd2
+ tcs + tcs2
+ ezd + ezd2
+ log(chcon) + (log(chcon))2
+ beauf + beauf2,
keep=keepit)
####

I would like to look at the percent change in deviance between the null model and the model that step.glm selects. To do this, I used the "summary" command on indglm1.stripe10. The value returned for the null deviance is LESS than that for the residual deviance:

###
Null Deviance: 238718.1 on 14522 degrees of freedom

Residual Deviance: 258702.1 on 14508 degrees of freedom
###

Then I called for a summary of the null model (null.indglm.stripe10) and a much larger value was returned for the null deviance:

###
Null Deviance: 278765.6 on 14522 degrees of freedom

Residual Deviance: 278765.6 on 14522 degrees of freedom
###

It looks like step.glm is using the larger value for the null deviance in computing AIC (I see this when I get the printout from indglm1.stripe10$anova). Why is the estimated value of the null deviance different between the summary of the null model and the summary of the best model from step.glm? Does this have to do with my specification of the quasi-likelihood family? Is the second value of the null deviance derived from a revised estimate of the scale (dispersion) parameter? When does the estimated value of the null deviance change in the step.glm algorithm?

Thanks for your time.

Megan


--
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

<Prev in Thread] Current Thread [Next in Thread>