This is one of the scandals of S-PLUS in my view. predict( ) with models
that involve predictors that are functions of the whole X matrix rather than
each component of the predictor being a function of just that row of the X
matrix.
This occurs with predictors such as ns(x, df), bs(x, df), poly(x, k) and
some others. The suggested S-PLUS solution is to use the explicit method
function predict.gam( ) rather than just the generic form, predict( ).
This advice comes from Statistical Models in S and re-inforced in MASS (with
an appropriate complaint, of course!) Here are a few concocted examples.
fm <- lm(y ~ x + x^2 + x^3, dat)
pfm <- predict(fm, newData) # works OK since the power form is used
fm <- lm(y ~ poly(x, 3, dat)
pfm <- predict(fm, newData) # will likely give nonsense
# unless newData
and dat are the same
pfm <- predict.gam(fm, newData) # should work
fm <- lm(y ~ ns(x, 4), dat)
pfm <- predict.gam(fm, newData) # may work, but if it does not
# at least you get
an obscure warning
-----Original Message-----
From: Spencer Graves [mailto:spencer.graves@pdf.com]
Sent: Monday, October 20, 2003 8:50 AM
To: Ping Zhang
Cc: s-news@lists.biostat.wustl.edu
Subject: Re: [S] question about predict.gam
I believe the problem is "ns", and I've seen a discussion of this
issue somewhere, perhaps in Venables and Ripley (2002) Modern Applied
Statistics with S, 4th ed. or (2000) S Programming (both Springer), but
I don't have those books with me now, so I can't check.
Consider the following toy problem:
DF <- data.frame(x=1:11, y=rep(0:1, length=11))
fit <- lm(y~ns(x, 3), DF)
DF2 <- data.frame(x=3:5)
predict(fit, DF2)
When I execute this in S-Plus 6.1, I get the following:
> predict(fit, DF2)
1 2 3
0.3021291 0.5672211 0.3021291
When I execute the same thing in R 1.7.1 [preceded by
"library(splines)"], I get the following:
> predict(fit, DF2)
1 2 3
0.4581415 0.5174760 0.5547733
I have NOT checked either of these computations manually, but I
believe I read somewhere that R stores enough information in "fit" that
it can produce correct answers to these kinds of problems where S-Plus
currently cannot. If I still wanted to do it in S-Plus, I might try
something like the following:
r4<-glm(O~log(size) + I(log(size)^2) + I(log(size)^3) + I(log(size)^4)
+x1+x2,data=indata,family=poisson)
Then I'd examine the behavior of the estimated polynomial in
log(size) over the range of interest. If it were sufficiently well
behaved, maybe I wouldn't need "ns".
hope this helps. spencer graves
Ping Zhang wrote:
I am fitting a Poisson glm model, say
r<-glm(O~ns(log(size),3)+x1+x2,data=indata,family=poisson)
I understand that the predict.glm() function is unreliable and the
document says it is
safer to use predict.gam() since a data dependent transformation is used.
However, I have not been able to make it work. As a simple test,
if I pick a given row form the input dataset and call it "newdata".
If I run
predict.gam(r, type="response", newdata)
I expect to get the r$fitted.values for the corresponding row in indata.
I am getting answers that are far far away from what I expect.
Did I miss anything important? Is there anyway to make it work
correctly or reliably?
Ping Zhang
Avaya labs
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news