s-news
[Top] [All Lists]

SUMMARY: glm(...,family=poisson) transformation

To: s-news@lists.biostat.wustl.edu
Subject: SUMMARY: glm(...,family=poisson) transformation
From: Joe Sexton <sharpernail@yahoo.com>
Date: Thu, 13 Feb 2003 15:48:17 -0800 (PST)
Got it! You guys are brilliant. I owe you a debt equal
to today's progress on my master's thesis. If I could,
I'd buy you an e-beer.

CHARLES ANDERSON WROTE:
Try specifying the X and X^2 values in the model
statement rather than
using poly(). I think poly generates orthogonal
polynomials, which then
cause you problems with predictions. (Double check
whether the values
yielded by predict(...,type='response') are correct.
Maybe you need
predict.gam or something else. Check poly in the index
of Venables and
Ripley (MASS, 1997).

BRAD BIGGERSTAFF WROTE:
One problem I see is that you're using poly() to
define polynomial 
functions
of predictors, yet your "prediction equation" doesn't
use the poly()
parameterization.  For example:

> x <- 1:10
> y <- rpois(10,2)
> glm(y~x+x^2,family=poisson)
Call:
glm(formula = y ~ x + x^2, family = poisson)

Coefficients:
 (Intercept)         x      I(x^2) 
   0.1347568 0.3760669 -0.03313325

Degrees of Freedom: 10 Total; 7 Residual
Residual Deviance: 4.218207 
> glm(y~poly(x,2),family=poisson)
Call:
glm(formula = y ~ poly(x, 2), family = poisson)

Coefficients:
 (Intercept) poly(x, 2)1 poly(x, 2)2 
   0.9274946   0.1053726   -0.761344

Degrees of Freedom: 10 Total; 7 Residual
Residual Deviance: 4.218207 

....
That these yield the same predictions (using predict)
shows that S-Plus
knows which coefficients to use:

> predict.gam(glm(y~x+x^2,family=poisson))
         1         2         3        4       5       
6        7        
8 
 0.4776904 0.7543576 0.9647582 1.108892 1.18676
1.198361 1.143696 
1.022764

         9        10 
 0.8355656 0.5821008
> predict.gam(glm(y~poly(x,2),family=poisson))
         1         2         3        4       5       
6        7        
8 
 0.4776904 0.7543576 0.9647582 1.108892 1.18676
1.198361 1.143696 
1.022764

         9        10 
 0.8355656 0.5821008

.....

There is hope, though:

You can use the function poly.transform to get the
transformed 
coefficients
(for export), so that you can still use poly() in the
fitting:

>
poly.transform(poly(x,2),coef(glm(y~poly(x,2),family=poisson)))
       x^0       x^1         x^2 
 0.1347568 0.3760669 -0.03313325
> coef(glm(y~x+x^2,family=poisson))
 (Intercept)         x      I(x^2) 
   0.1347568 0.3760669 -0.03313325
> 

Then you can do your personal (non-predict) predicting
and transforming
using these coefficients, or just use the ones from
poly().


JOE SEXTON WROTE (response to Jose María Fedriani
Laffitte, Biggerstaff, Brad J., and John Fox):
As for specifying (...,type='response'), my problem is
that the larger dataset to which I am applying the
regression equation is pragmatically too large to
import into S Plus. (I'm regressing tree percent
canopy cover from about 150 Mb of satellite remotely
sensed imagery.) Therefore, I have to find a way to
manually predict my sample-calibrated equation in an
image-processing software that can't create GLM's on
its own.

As for using the exp() transformation, here are some
results:

> coefficients(dougfir100.x.glm)
                          Value Std. Error    t value 
         (Intercept)  -4.991452   1.176776 -4.2416314
   poly(fall.tc1, 1) -12.739892  15.805608 -0.8060362
  poly(fall.ndvi, 1)   7.341085   6.594062  1.1132872
  poly(sumr.tc1, 2)1   4.707552  20.419682  0.2305399
  poly(sumr.tc1, 2)2   9.380219  10.596375  0.8852290
  poly(sumr.tc2, 2)1   3.081442  20.380504  0.1511956
  poly(sumr.tc2, 2)2  -6.603397  12.827755 -0.5147742
poly(aspen.30.r, 2)1  -2.507069   6.497554 -0.3858481
poly(aspen.30.r, 2)2  -5.316410   4.269693 -1.2451505

Applying the coefficients above to the training
vectors (in S Plus),

>0-4.991452-12.739892*(fall.tc1)+7.341085*(fall.ndvi)+4.707552*(sumr.tc1)+9.380219*(sumr.tc1^2)+3.081442*(sumr.tc2)-6.603397*(sumr.tc2^2)-2.507069*(aspen.30.r)-5.316410*(aspen.30.r^2)

yields these results:

        1        2        3        4... 
 398728.1 350754.5 497222.4 317500.5...

But when the exp() function is applied to transform
the predictions--as Jose and I had thought
appropriate--to (...,family='poisson'), the values are
a bit off:

> exp(x)
   1   2   3   4... 
 Inf Inf Inf Inf...

This is especially interesting when compared to the
values yielded by predict(...,type='response'),
recommended by Brad and John:

> predict.glm(dougfir1.x.glm,contrain,type='response'
1           2           3         4...
0.003677987 0.008057072 0.0102533 0.008796681,

which are perfectly appropriate.

So, the question remains, is S Plus using some
function besides exp() as the transformation? Anyone?


JOE SEXTON WROTE:
I regressed a count variable on 5 ratio-scale
predictors using glm(...,family=poisson)in S Plus 2000
(Windows 2000). The fit was good (1-res/null deviance
= 0.75, all variables significant, errors distributed
neatly across observed values, etc.). I then
extrapolated predictions across the very large dataset
from which the training dataset was sampled by
applying the summary() coefficients to their
respective variables. 
 The predicted values, however, are not on the same
scale as the response, and I have been unable to
transform the predictions to the appropriate scale.

WHAT EQUATION DOES S PLUS 2000 USE TO TRANSFORM THE
LINEAR PREDICTIONS OF GLM(...,FAMILY=POISSON) TO THE
RESPONSE SCALE?

=====
Landscape Ecology: Modeling and Analysis Center  
& Department of Forest, Range, and Wildlife Science
Utah State University
     
"Go deeper."
-Zarathustra

__________________________________________________
Do you Yahoo!?
Yahoo! Shopping - Send Flowers for Valentine's Day
http://shopping.yahoo.com

<Prev in Thread] Current Thread [Next in Thread>
  • SUMMARY: glm(...,family=poisson) transformation, Joe Sexton <=