s-news
[Top] [All Lists]

Re: question about predict.gam

To: Spencer Graves <spencer.graves@pdf.com>
Subject: Re: question about predict.gam
From: Prof Brian Ripley <ripley@stats.ox.ac.uk>
Date: Mon, 20 Oct 2003 08:19:37 +0100 (BST)
Cc: Ping Zhang <pingzhang@avaya.com>, <s-news@lists.biostat.wustl.edu>
In-reply-to: <3F9314FA.6060602@pdf.com>
On Sun, 19 Oct 2003, Spencer Graves wrote:

>       I believe the problem is "ns", and I've seen a discussion of this 
> issue somewhere, perhaps in Venables and Ripley (2002) Modern Applied 
> Statistics with S, 4th ed. or (2000) S Programming (both Springer), but 
> I don't have those books with me now, so I can't check. 

MASS4 section 6.4 (and in earlier editions too).

>       Consider the following toy problem: 
> 
> DF <- data.frame(x=1:11, y=rep(0:1, length=11))
> fit <- lm(y~ns(x, 3), DF)
> DF2 <- data.frame(x=3:5)
> predict(fit, DF2)
> 
>       When I execute this in S-Plus 6.1, I get the following: 
> 
>  > predict(fit, DF2)
>          1         2         3
>  0.3021291 0.5672211 0.3021291
> 
>       When I execute the same thing in R 1.7.1 [preceded by 
> "library(splines)"], I get the following: 
> 
>  > predict(fit, DF2)
>         1         2         3
> 0.4581415 0.5174760 0.5547733
> 
>       I have NOT checked either of these computations manually, but I 
> believe I read somewhere that R stores enough information in "fit" that 
> it can produce correct answers to these kinds of problems where S-Plus 
> currently cannot.  

Yes, that's true: R stores in the fit object a call to model.matrix which
will predict correctly for formulae it covers (including bs, ns and poly 
terms, with an extensible mechanism).

> If I still wanted to do it in S-Plus, I might try 
> something like the following: 
> 
> r4<-glm(O~log(size) + I(log(size)^2) + I(log(size)^3) + I(log(size)^4) 
> +x1+x2,data=indata,family=poisson)
> 
>       Then I'd examine the behavior of the estimated polynomial in 
> log(size) over the range of interest.  If it were sufficiently well 
> behaved, maybe I wouldn't need "ns". 

As Bill Venables has said, the standard advice is to use predict.gam, 
which is approximately correct for regression spline fits (and exactly 
correct when poly()) is used.

There is an addon to S-PLUS that claims to do `smart prediction', 
including modified versions of e.g. glm to cover this problem.  Googling 
showed it at http://www.stat.auckland.ac.nz/~yee/smartpred/index.shtml.
(Worryingly there's even a version for R 1.7.1, although it has not been a 
problem since R 1.5.0!  R's mechanism seems both simpler and more 
general.)

> Ping Zhang wrote:
> 
> > I am fitting a Poisson glm model, say
> >
> > r<-glm(O~ns(log(size),3)+x1+x2,data=indata,family=poisson)
> >
> > I understand that the predict.glm() function is unreliable and the 
> > document says it is
> > safer to use predict.gam() since a data dependent transformation is used.
> > However, I have not been able to make it work. As a simple test,
> > if I pick a given row form the input dataset and call it "newdata".
> > If I run
> >
> > predict.gam(r, type="response", newdata)
> >
> > I expect to get the r$fitted.values for the corresponding row in indata.
> > I am getting answers that are far far away from what I expect.
> >
> > Did I miss anything important? Is there anyway to make it work 
> > correctly or reliably?


-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


<Prev in Thread] Current Thread [Next in Thread>