s-news
[Top] [All Lists]

gam() and substitute()

To: s-news@lists.biostat.wustl.edu
Subject: gam() and substitute()
From: Eva Cantoni <Eva.Cantoni@metri.unige.ch>
Date: Fri, 12 Nov 2004 10:38:58 +0100
User-agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.3.1) Gecko/20030508
I'm writing a function which the (simplified) core is

myfunction <- function(formula=formula(data), data=sys.parent())
{

n <- nrow(data)
thiscsample <- data[1:(n-10),] # construction sample
thisvsample <- data[(n-9):n,] # validation sample

thisgam <- gam(formula=formula,data=thiscsample) # Fit on the construction sample predict.gam(thisgam,newdata=thisvsample,type="terms") # Extract prediction on the validation sample

}

When I call the function on my dataset, this call works:
> myfunction(y ~ s(x1,df=3)+s(x2,df=4), data = mydata)

but not this one
> dfx1 <- 3; dfx2 <- 4
> myfunction(y ~ s(x1,df=dfx1)+s(x2,df=dfx2), data = mydata)
> Problem in myfunction(y ~ s(x1, df = dfx1) +..: Length of dfx1 (variable 2) is 1 != length of others (405)


It doesn't work, because when predicting on new data, predict.gam cannot access the degrees of freedom. I think the solution goes through the use of substitute() (that I have been able to use succesfully in others circumstances), but I can't figure out here how to define the list to be "substituted" (For the use of substitute() with gam() see also the message below from the Snews archive).
Any idea?

Thank you for your help,
Eva Cantoni

-----------------------------------------------------------------------------
# John Thaden, from the S-new mailing list ( Tue, 08 Sep 1998 01:52:46 -0500).

In May, I wrote both S-News and MathSoft customer support because
predict.gam() often gave me arcane error messages, such as . . .
Error in lm.fit.qr(x, y, singular.ok = ..1): Number of observations in x and y not equal
or . . .
   Error in safe.predict.gam(object, newdata, type,..:
   Length of variable 1 is 1 != length of row names (40)
or . . .
Warning in x * w.factor: Length of longer object is not a multiple of the length of the shorter object
  Error in names<-: Invalid length for names attribute:
  y = structure(..

Thanks to Brian Ripley, Rachel Fewster, Bill Venables, StatSci's Julie
Dryden, and an unnamed StatSci/MathSoft statistician, I've gained insight
into predict.gam() that I'll share:

A command

  >predict.gam(object, newdata)

results in a call to safe.predict.gam(), which among other things uses
rbind.data.frame() to make a new data frame that includes both original
data and newdata (this is integral to the `safe' method of predict.gam).  I
think most problems are problems with safe.predict.gam getting the original
data. So far, I know of two such problems. The first is violation of scoping rules. Reviewing these . . .
/ When you try to reference a variable in a function, S looks first/
/ for the variable /
/ (1) in that function/
/ (2) NOT in any function that called that function/
/ (3) on frame 1/
/ (4) on frame 0/
/ (5) on permanent data bases./

So if the old data was created in a function you write that also calls
predict.gam(), then safe.predict.gam() won't see it.  The solution is to
assign the old data to frame 1.  A special case of this is for() loops, so
from Bill Venables via the archives:

/tmp <- -20:20/
/old.data <- data.frame(x = tmp, y = tmp + tmp^2 + rnorm(tmp))/
/new.x <- data.frame(x = 21)/
/for(i in 1:3) {/
>/+  ff <- lm(y ~ poly(x, i), old.data)/
>/+  print(predict.gam(ff, new.x))/
>/}/
Error in safe.predict.gam(object, newdata,..: Length of variable
        2 is 1 != length of row names (41)
Dumped

His solution:
/ for(i in 1:3) {/
/+   form <- substitute(y ~ poly(x, .i), list(.i = i))/
/+   ff <- lm(form, old.data)/
/+   print(predict.gam(ff, new.x))/
/+ }/
/[1] 161.64/
/[1] 462.64/
/[1] 463.56/

The second problem occurs when the gam(), glm(), or lm() command that
created object referred to in predict.gam(object, newdata) somehow changes
the dimensions of the original data.  The case I am certain of is using
na.action=na.omit.  The  solution is to groom your original data before
creating the gam object:

/original.data <- na.omit(original.data)/

I suspect that use of the `subscript=`  argument, or using subscripting
when specifying the data= argument in gam(), glm(), or lm() will also cause
the problem.

There may be other conditions that cause predict.gam to dump, but scoping
problems and data subscripting seem to be two of them.


************************************************************************
John Thaden, Ph.D., Instructor               jjthaden@life.uams.edu
Department of Geriatrics                     (501) 661-1202 x 2986
University of Arkansas for Medical Sciences  FAX: (501) 671-2510

      mail & ship to:  J. L. McClellan V.A. Medical Center
                       Research-151 (Room GB103, GC124)
                       4300 West 7th Street
                       Little Rock AR 72205 USA
************************************************************************
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news



--

Dr Eva Cantoni                 phone  : (+41) 22 379 8240
Econométrie - Univ. Genève     fax    : (+41) 22 379 8299
40, Bd du Pont d'Arve          e-mail : Eva.Cantoni@metri.unige.ch
CH-1211 Genève 4 http://www.unige.ch/ses/metri/cantoni



<Prev in Thread] Current Thread [Next in Thread>