s-news
[Top] [All Lists]

Re: question about predict.gam

To: spencer.graves@pdf.com, pingzhang@avaya.com
Subject: Re: question about predict.gam
From: Bill.Venables@csiro.au
Date: Mon, 20 Oct 2003 10:15:18 +1000
Cc: s-news@lists.biostat.wustl.edu
This is one of the scandals of S-PLUS in my view.  predict( ) with models
that involve predictors that are functions of the whole X matrix rather than
each component of the predictor being a function of just that row of the X
matrix.

This occurs with predictors such as ns(x, df), bs(x, df), poly(x, k) and
some others.  The suggested S-PLUS solution is to use the explicit method
function predict.gam( ) rather than just the generic form, predict( ).

This advice comes from Statistical Models in S and re-inforced in MASS (with
an appropriate complaint, of course!)  Here are a few concocted examples.

fm <- lm(y ~ x + x^2 + x^3, dat)
pfm <- predict(fm, newData)        # works OK since the power form is used

fm <- lm(y ~ poly(x, 3, dat)
pfm <- predict(fm, newData)               # will likely give nonsense 
                                                          # unless newData
and dat are the same
pfm <- predict.gam(fm, newData)   # should work

fm <- lm(y ~ ns(x, 4), dat)
pfm <- predict.gam(fm, newData)   # may work, but if it does not 
                                                          # at least you get
an obscure warning

-----Original Message-----
From: Spencer Graves [mailto:spencer.graves@pdf.com]
Sent: Monday, October 20, 2003 8:50 AM
To: Ping Zhang
Cc: s-news@lists.biostat.wustl.edu
Subject: Re: [S] question about predict.gam


      I believe the problem is "ns", and I've seen a discussion of this 
issue somewhere, perhaps in Venables and Ripley (2002) Modern Applied 
Statistics with S, 4th ed. or (2000) S Programming (both Springer), but 
I don't have those books with me now, so I can't check. 

      Consider the following toy problem: 

DF <- data.frame(x=1:11, y=rep(0:1, length=11))
fit <- lm(y~ns(x, 3), DF)
DF2 <- data.frame(x=3:5)
predict(fit, DF2)

      When I execute this in S-Plus 6.1, I get the following: 

 > predict(fit, DF2)
         1         2         3
 0.3021291 0.5672211 0.3021291

      When I execute the same thing in R 1.7.1 [preceded by 
"library(splines)"], I get the following: 

 > predict(fit, DF2)
        1         2         3
0.4581415 0.5174760 0.5547733

      I have NOT checked either of these computations manually, but I 
believe I read somewhere that R stores enough information in "fit" that 
it can produce correct answers to these kinds of problems where S-Plus 
currently cannot.  If I still wanted to do it in S-Plus, I might try 
something like the following: 

r4<-glm(O~log(size) + I(log(size)^2) + I(log(size)^3) + I(log(size)^4) 
+x1+x2,data=indata,family=poisson)

      Then I'd examine the behavior of the estimated polynomial in 
log(size) over the range of interest.  If it were sufficiently well 
behaved, maybe I wouldn't need "ns". 

hope this helps.  spencer graves

Ping Zhang wrote:

> I am fitting a Poisson glm model, say
>
> r<-glm(O~ns(log(size),3)+x1+x2,data=indata,family=poisson)
>
> I understand that the predict.glm() function is unreliable and the 
> document says it is
> safer to use predict.gam() since a data dependent transformation is used.
> However, I have not been able to make it work. As a simple test,
> if I pick a given row form the input dataset and call it "newdata".
> If I run
>
> predict.gam(r, type="response", newdata)
>
> I expect to get the r$fitted.values for the corresponding row in indata.
> I am getting answers that are far far away from what I expect.
>
> Did I miss anything important? Is there anyway to make it work 
> correctly or reliably?
>
> Ping Zhang
> Avaya labs
>
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news


--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>