s-news
[Top] [All Lists]

Re: question about predict.gam

To: Bill.Venables@csiro.au
Subject: Re: question about predict.gam
From: Spencer Graves <spencer.graves@pdf.com>
Date: Sun, 19 Oct 2003 17:41:52 -0700
Cc: pingzhang@avaya.com, s-news@lists.biostat.wustl.edu
In-reply-to: <E09E527B56BE2D438A3D6A246DDD27A916610E@roper-cv.qld.cmis.csiro.au>
References: <E09E527B56BE2D438A3D6A246DDD27A916610E@roper-cv.qld.cmis.csiro.au>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)
Hi, Bill: Am I correct that R does not have this problem?
Thanks for your comment.  spencer graves

Bill.Venables@csiro.au wrote:

This is one of the scandals of S-PLUS in my view.  predict( ) with models
that involve predictors that are functions of the whole X matrix rather than
each component of the predictor being a function of just that row of the X
matrix.

This occurs with predictors such as ns(x, df), bs(x, df), poly(x, k) and
some others.  The suggested S-PLUS solution is to use the explicit method
function predict.gam( ) rather than just the generic form, predict( ).

This advice comes from Statistical Models in S and re-inforced in MASS (with
an appropriate complaint, of course!)  Here are a few concocted examples.

fm <- lm(y ~ x + x^2 + x^3, dat)
pfm <- predict(fm, newData)        # works OK since the power form is used

fm <- lm(y ~ poly(x, 3, dat)
pfm <- predict(fm, newData) # will likely give nonsense # unless newData
and dat are the same
pfm <- predict.gam(fm, newData)        # should work

fm <- lm(y ~ ns(x, 4), dat)
pfm <- predict.gam(fm, newData) # may work, but if it does not # at least you get
an obscure warning

-----Original Message-----
From: Spencer Graves [mailto:spencer.graves@pdf.com]
Sent: Monday, October 20, 2003 8:50 AM
To: Ping Zhang
Cc: s-news@lists.biostat.wustl.edu
Subject: Re: [S] question about predict.gam


I believe the problem is "ns", and I've seen a discussion of this issue somewhere, perhaps in Venables and Ripley (2002) Modern Applied Statistics with S, 4th ed. or (2000) S Programming (both Springer), but I don't have those books with me now, so I can't check. Consider the following toy problem:
DF <- data.frame(x=1:11, y=rep(0:1, length=11))
fit <- lm(y~ns(x, 3), DF)
DF2 <- data.frame(x=3:5)
predict(fit, DF2)

When I execute this in S-Plus 6.1, I get the following:
> predict(fit, DF2)
        1         2         3
0.3021291 0.5672211 0.3021291

When I execute the same thing in R 1.7.1 [preceded by "library(splines)"], I get the following:
> predict(fit, DF2)
       1         2         3
0.4581415 0.5174760 0.5547733

I have NOT checked either of these computations manually, but I believe I read somewhere that R stores enough information in "fit" that it can produce correct answers to these kinds of problems where S-Plus currently cannot. If I still wanted to do it in S-Plus, I might try something like the following: r4<-glm(O~log(size) + I(log(size)^2) + I(log(size)^3) + I(log(size)^4) +x1+x2,data=indata,family=poisson)

Then I'd examine the behavior of the estimated polynomial in log(size) over the range of interest. If it were sufficiently well behaved, maybe I wouldn't need "ns".
hope this helps.  spencer graves

Ping Zhang wrote:

I am fitting a Poisson glm model, say

r<-glm(O~ns(log(size),3)+x1+x2,data=indata,family=poisson)

I understand that the predict.glm() function is unreliable and the document says it is
safer to use predict.gam() since a data dependent transformation is used.
However, I have not been able to make it work. As a simple test,
if I pick a given row form the input dataset and call it "newdata".
If I run

predict.gam(r, type="response", newdata)

I expect to get the r$fitted.values for the corresponding row in indata.
I am getting answers that are far far away from what I expect.

Did I miss anything important? Is there anyway to make it work correctly or reliably?

Ping Zhang
Avaya labs

--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news


--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news


<Prev in Thread] Current Thread [Next in Thread>