s-news
[Top] [All Lists]

Re: Prediction intervals for GAM

To: "Terry Koen" <Terry.Koen@dnr.nsw.gov.au>
Subject: Re: Prediction intervals for GAM
From: Tim Hesterberg <timh@insightful.com>
Date: 14 Feb 2006 09:49:58 -0800
Cc: <s-news@lists.biostat.wustl.edu>
In-reply-to: <s3f1a555.073@dlwcmail.dlwc.nsw.gov.au> (Terry.Koen@dnr.nsw.gov.au)
References: <s3f1a555.073@dlwcmail.dlwc.nsw.gov.au>
Prediction intervals in GAM models can be problematic.
For logistic regression, for example, the y values are 0 or 1, and the
interval would be one of:
        [0, 1]
        [0]
        [1]

One approach is to generate random values of regression parameters
(using bootstrapping, or using the estimated parameters and their
estimated covariance matrix), generate y values given those parameters
and the x values of interest, and take the middle 95% of the y values
as the prediction interval.  In the case of logistic regression, where
the y values are 0 or 1, this would give one of the intervals above.

In a large sample, where uncertainty in the parameters is small relative
to the conditional variance of y given the parameters, an approximation
would be to use the middle 95% of the conditional distribution of y,
given x and the parameters.

Tim Hesterberg

>Dear S-Plus Users,    (using S-Plus 6.2 for Windows)
> 
>A colleague of mine has asked how he can derive (and save) "95%
>prediction intervals" for a high degree Generalised Additive Model he
>has fitted.  The resulting graphic that he would produce would be
>something like the pointwise confidence intervals one can get from the
>command plot(gam.object, se=T).
> 
>I know I can use ...
>predict.gam( fit.gam, type="response", se.fit=T, pi.fit=T)
>but this gives me prediction intervals only about the linear portion of
>the GAM.
> 
>I seem to have lost my original reference to how I calculated
>confidence bands using jacknife residuals ...
>fit <- smooth.spline(NYear,NYield,df=40)
>res <- (fit$yin - fit$y)/(1-fit$lev)             # jacknife residuals
>sigma <- sqrt(var(res))                               # estimate of
>standard deviation
>upper <- fit$y + 2.0 * sigma * sqrt(fit$lev)  # upper 95% confidence
>band
>lower <- fit$y - 2.0 * sigma * sqrt(fit$lev)  # lower 95% confidence
>band
>
> 
>... but cannot help but feel the formula I need may be something like
>upper <- fit$y + 2.0 * sqrt( sigma^2  * (1 + fit$lev))
>lower <- fit$y - 2.0 * sqrt( sigma^2  * (1 + fit$lev))
> ... and that way bring in the variance about an individual point.
> 
>Any advice / insights would be gratefully accepted.
> 
>thanks, Terry
> 
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>Terry Koen
>Senior Biometrician            ph: +61 2 63419119
>CNR Research Centre       fx:  +61 2 63424551
>Dept Natural Resources
>P.O. Box 445
>Cowra 2794  NSW        email: Terry.Koen@dnr.nsw.gov.au
>Australia
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


<Prev in Thread] Current Thread [Next in Thread>