S-news readers,
This is not so much an S+ technical question as a statistics question,
so please ignore this if you feel I should go elsewhere for statistical
advice.
I have fit a glm to count data. In this data set there are 801
observations, with the following distribution:
Count: 0 1 2 3
# obs: 753 44 3 1
db2 <- glm(resp2 ~ offset(loffset2) + Xvar2,
data=fe2,
family=poisson(link="log"))
Null Deviance: 295.56 on 800 degrees of freedom
Residual Deviance: 291.47 on 799 degrees of freedom
Number of Fisher Scoring Iterations: 6
McCullagh and Nelder give two alternatives to estimating the dispersion
parameter, Chisq/df of Deviance/df. These give me radically different
estimates with this data set:
# Calculate Deviance estimate of dispersion
disp.dev <- with(db2,deviance/df.residual)
disp.dev
[1] 0.3647961
# Calculate Pearson Chisq estimate of dispersion
disp.chisq <-
with(db2,sum((y-fitted.values)^2/(fitted.values)/df.residual))
disp.chisq
[1] 1.233402
I have run the same analysis in SAS (gasp!) and it calculates exactly
the same dispersion estimates, so I think my formulas are correct.
Of course which dispersion estimate I choose doesn't affect my estimated
coefficient for Xvar2, only its standard error and leads me to very
different inference...
(Dispersion = 1) Value StdError t value
(Intercept) -1.50199 0.2651 -5.6658
Xvar2 0.21121 0.1036 2.0387
(Dispersion = disp.dev = 0.36479 )
(Intercept) -1.50199 0.160115 -9.3807
Xvar2 0.21121 0.062572 3.3755
(Dispersion = disp.chisq = 1.233402 )
(Intercept) -1.50199 0.29441 -5.1016
lcalls2 0.21121 0.11506 1.8357
I would greatly appreciate any advice or recommendations for papers
discussing this issue, and mostly which one should I use in this
situation?
Manuela
>::<>::<>::<>::<>::<>::<>::<>::<>::<>::<>::<>::<>::<>::<>::<>::<>::<>::<
>::<>::<>::<
Manuela Huso
Consulting Statistician
Department of Forest Ecosystems and Society
201H Richardson Hall
Oregon State University
Corvallis, OR 97331
ph: 541.737.6232
fx: 541.737.1393
|