s-news
[Top] [All Lists]

Re: question about predict.gam

To: Ping Zhang <pingzhang@avaya.com>
Subject: Re: question about predict.gam
From: Spencer Graves <spencer.graves@pdf.com>
Date: Sun, 19 Oct 2003 15:49:30 -0700
Cc: s-news@lists.biostat.wustl.edu
In-reply-to: <3F92DE99.3050208@avaya.com>
References: <20031018015240.537264021AB@smtp.biostat.wustl.edu> <3F92DE99.3050208@avaya.com>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)
I believe the problem is "ns", and I've seen a discussion of this issue somewhere, perhaps in Venables and Ripley (2002) Modern Applied Statistics with S, 4th ed. or (2000) S Programming (both Springer), but I don't have those books with me now, so I can't check. Consider the following toy problem:
DF <- data.frame(x=1:11, y=rep(0:1, length=11))
fit <- lm(y~ns(x, 3), DF)
DF2 <- data.frame(x=3:5)
predict(fit, DF2)

When I execute this in S-Plus 6.1, I get the following:
> predict(fit, DF2)
        1         2         3
0.3021291 0.5672211 0.3021291

When I execute the same thing in R 1.7.1 [preceded by "library(splines)"], I get the following:
> predict(fit, DF2)
       1         2         3
0.4581415 0.5174760 0.5547733

I have NOT checked either of these computations manually, but I believe I read somewhere that R stores enough information in "fit" that it can produce correct answers to these kinds of problems where S-Plus currently cannot. If I still wanted to do it in S-Plus, I might try something like the following: r4<-glm(O~log(size) + I(log(size)^2) + I(log(size)^3) + I(log(size)^4) +x1+x2,data=indata,family=poisson)

Then I'd examine the behavior of the estimated polynomial in log(size) over the range of interest. If it were sufficiently well behaved, maybe I wouldn't need "ns".
hope this helps.  spencer graves

Ping Zhang wrote:

I am fitting a Poisson glm model, say

r<-glm(O~ns(log(size),3)+x1+x2,data=indata,family=poisson)

I understand that the predict.glm() function is unreliable and the document says it is
safer to use predict.gam() since a data dependent transformation is used.
However, I have not been able to make it work. As a simple test,
if I pick a given row form the input dataset and call it "newdata".
If I run

predict.gam(r, type="response", newdata)

I expect to get the r$fitted.values for the corresponding row in indata.
I am getting answers that are far far away from what I expect.

Did I miss anything important? Is there anyway to make it work correctly or reliably?

Ping Zhang
Avaya labs

--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news



<Prev in Thread] Current Thread [Next in Thread>