I'm writing a function which the (simplified) core is
myfunction <- function(formula=formula(data), data=sys.parent())
{
n <- nrow(data)
thiscsample <- data[1:(n-10),] # construction sample
thisvsample <- data[(n-9):n,] # validation sample
thisgam <- gam(formula=formula,data=thiscsample) # Fit on the
construction sample
predict.gam(thisgam,newdata=thisvsample,type="terms") # Extract
prediction on the validation sample
}
When I call the function on my dataset, this call works:
> myfunction(y ~ s(x1,df=3)+s(x2,df=4), data = mydata)
but not this one
> dfx1 <- 3; dfx2 <- 4
> myfunction(y ~ s(x1,df=dfx1)+s(x2,df=dfx2), data = mydata)
> Problem in myfunction(y ~ s(x1, df = dfx1) +..: Length of dfx1
(variable 2) is 1 != length of others (405)
It doesn't work, because when predicting on new data, predict.gam cannot
access the degrees of freedom. I think the solution goes through the use
of substitute() (that I have been able to use succesfully in others
circumstances), but I can't figure out here how to define the list to be
"substituted" (For the use of substitute() with gam() see also the
message below from the Snews archive).
Any idea?
Thank you for your help,
Eva Cantoni
-----------------------------------------------------------------------------
# John Thaden, from the S-new mailing list ( Tue, 08 Sep 1998 01:52:46
-0500).
In May, I wrote both S-News and MathSoft customer support because
predict.gam() often gave me arcane error messages, such as . . .
Error in lm.fit.qr(x, y, singular.ok = ..1): Number of
observations in x and y not equal
or . . .
Error in safe.predict.gam(object, newdata, type,..:
Length of variable 1 is 1 != length of row names (40)
or . . .
Warning in x * w.factor: Length of longer object is
not a multiple of the length of the shorter object
Error in names<-: Invalid length for names attribute:
y = structure(..
Thanks to Brian Ripley, Rachel Fewster, Bill Venables, StatSci's Julie
Dryden, and an unnamed StatSci/MathSoft statistician, I've gained insight
into predict.gam() that I'll share:
A command
>predict.gam(object, newdata)
results in a call to safe.predict.gam(), which among other things uses
rbind.data.frame() to make a new data frame that includes both original
data and newdata (this is integral to the `safe' method of predict.gam). I
think most problems are problems with safe.predict.gam getting the original
data. So far, I know of two such problems.
The first is violation of scoping rules. Reviewing these . . .
/ When you try to reference a variable in a function, S looks first/
/ for the variable /
/ (1) in that function/
/ (2) NOT in any function that called that function/
/ (3) on frame 1/
/ (4) on frame 0/
/ (5) on permanent data bases./
So if the old data was created in a function you write that also calls
predict.gam(), then safe.predict.gam() won't see it. The solution is to
assign the old data to frame 1. A special case of this is for() loops, so
from Bill Venables via the archives:
/tmp <- -20:20/
/old.data <- data.frame(x = tmp, y = tmp + tmp^2 + rnorm(tmp))/
/new.x <- data.frame(x = 21)/
/for(i in 1:3) {/
>/+ ff <- lm(y ~ poly(x, i), old.data)/
>/+ print(predict.gam(ff, new.x))/
>/}/
Error in safe.predict.gam(object, newdata,..: Length of variable
2 is 1 != length of row names (41)
Dumped
His solution:
/ for(i in 1:3) {/
/+ form <- substitute(y ~ poly(x, .i), list(.i = i))/
/+ ff <- lm(form, old.data)/
/+ print(predict.gam(ff, new.x))/
/+ }/
/[1] 161.64/
/[1] 462.64/
/[1] 463.56/
The second problem occurs when the gam(), glm(), or lm() command that
created object referred to in predict.gam(object, newdata) somehow changes
the dimensions of the original data. The case I am certain of is using
na.action=na.omit. The solution is to groom your original data before
creating the gam object:
/original.data <- na.omit(original.data)/
I suspect that use of the `subscript=` argument, or using subscripting
when specifying the data= argument in gam(), glm(), or lm() will also cause
the problem.
There may be other conditions that cause predict.gam to dump, but scoping
problems and data subscripting seem to be two of them.
************************************************************************
John Thaden, Ph.D., Instructor jjthaden@life.uams.edu
Department of Geriatrics (501) 661-1202 x 2986
University of Arkansas for Medical Sciences FAX: (501) 671-2510
mail & ship to: J. L. McClellan V.A. Medical Center
Research-151 (Room GB103, GC124)
4300 West 7th Street
Little Rock AR 72205 USA
************************************************************************
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news
--
Dr Eva Cantoni phone : (+41) 22 379 8240
Econométrie - Univ. Genève fax : (+41) 22 379 8299
40, Bd du Pont d'Arve e-mail : Eva.Cantoni@metri.unige.ch
CH-1211 Genève 4 http://www.unige.ch/ses/metri/cantoni
|