s-news
[Top] [All Lists]

Re: gam() and substitute()

To: Eva Cantoni <Eva.Cantoni@metri.unige.ch>, s-news@lists.biostat.wustl.edu
Subject: Re: gam() and substitute()
From: Tony Plate <tplate@blackmesacapital.com>
Date: Fri, 12 Nov 2004 09:32:14 -0700
In-reply-to: <419484B2.2020002@metri.unige.ch>
References: <419484B2.2020002@metri.unige.ch>
If I understand you correctly, here's how to use substitute() to get what you want.

The reason that this is necessary is that the formula operator is special in that it does not evaluate its arguments, it just records their syntactic form. This is usually what you want for a formula, except for cases like this where you want some of the variables in the formula substituted with values the variables have in the frame where the formula is defined. This is what to use substitute() for. I've changed the names of the variables in the formula to make this more obvious.

> myfunction(y ~ s(x1,df=dfx1)+s(x2,df=dfx2), data = mydata)
Warning in s.wam(X, z, wz, fit$smooth, which, fit$smooth.frame, b..: s.wam convergence not obtained in 30
  iterations
Problem in myfunction(y ~ s(x1, df = dfx1) + s(x2, df = dfx2), da..: Length of dfx1 (variable 2) is 1 != l
ength of others (10)
Evaluation frames saved in object "last.dump", use debugger() to examine them
> myfunction(substitute(y ~ s(x1,df=aValueForDfx1)+s(x2,df=aValueForDfx2), list(aValueForDfx1=dfx1, aValueForDfx2=dfx2)), data = mydata) Warning in s.wam(X, z, wz, fit$smooth, which, fit$smooth.frame, b..: s.wam convergence not obtained in 30
  iterations
   s(x1, df = 3) s(x2, df = 4)
11    0.07250727    -0.6226814
12   -0.73110591    -0.4656261
13    0.20016743     0.2530262
14    0.30420173    -0.2546245
15   -0.58072101     0.2584776
16   -0.53189753    -0.4196539
17    0.10450717    -0.6348349
18   -0.08733088     0.1375277
19   -0.11258987    -0.3180230
20   -0.55573630    -0.4244847
attr(, "constant"):
[1] -0.2203049

> # show the formula that is getting passed to gam()
> substitute(y ~ s(x1,df=aValueForDfx1)+s(x2,df=aValueForDfx2), list(aValueForDfx1=dfx1, aValueForDfx2=dfx2))
y ~ s(x1, df = 3) + s(x2, df = 4)
>


At Friday 02:38 AM 11/12/2004, Eva Cantoni wrote:
I'm writing a function which the (simplified) core is

myfunction <- function(formula=formula(data), data=sys.parent())
{

n <- nrow(data)
thiscsample <- data[1:(n-10),] # construction sample
thisvsample <- data[(n-9):n,] # validation sample

thisgam <- gam(formula=formula,data=thiscsample) # Fit on the construction sample predict.gam(thisgam,newdata=thisvsample,type="terms") # Extract prediction on the validation sample

}

When I call the function on my dataset, this call works:
> myfunction(y ~ s(x1,df=3)+s(x2,df=4), data = mydata)

but not this one
> dfx1 <- 3; dfx2 <- 4
> myfunction(y ~ s(x1,df=dfx1)+s(x2,df=dfx2), data = mydata)
> Problem in myfunction(y ~ s(x1, df = dfx1) +..: Length of dfx1 (variable 2) is 1 != length of others (405)


It doesn't work, because when predicting on new data, predict.gam cannot access the degrees of freedom. I think the solution goes through the use of substitute() (that I have been able to use succesfully in others circumstances), but I can't figure out here how to define the list to be "substituted" (For the use of substitute() with gam() see also the message below from the Snews archive).
Any idea?

Thank you for your help,
Eva Cantoni

-----------------------------------------------------------------------------
# John Thaden, from the S-new mailing list ( Tue, 08 Sep 1998 01:52:46 -0500).

In May, I wrote both S-News and MathSoft customer support because
predict.gam() often gave me arcane error messages, such as . . .
Error in lm.fit.qr(x, y, singular.ok = ..1): Number of observations in x and y not equal
or . . .
   Error in safe.predict.gam(object, newdata, type,..:
   Length of variable 1 is 1 != length of row names (40)
or . . .
Warning in x * w.factor: Length of longer object is not a multiple of the length of the shorter object
  Error in names<-: Invalid length for names attribute:
  y = structure(..

Thanks to Brian Ripley, Rachel Fewster, Bill Venables, StatSci's Julie
Dryden, and an unnamed StatSci/MathSoft statistician, I've gained insight
into predict.gam() that I'll share:

A command
  >predict.gam(object, newdata)

results in a call to safe.predict.gam(), which among other things uses
rbind.data.frame() to make a new data frame that includes both original
data and newdata (this is integral to the `safe' method of predict.gam).  I
think most problems are problems with safe.predict.gam getting the original
data.  So far, I know of two such problems.

The first is violation of scoping rules.  Reviewing these . . .
/ When you try to reference a variable in a function, S looks first/
/ for the variable /
/ (1) in that function/
/ (2) NOT in any function that called that function/
/ (3) on frame 1/
/ (4) on frame 0/
/ (5) on permanent data bases./

So if the old data was created in a function you write that also calls
predict.gam(), then safe.predict.gam() won't see it.  The solution is to
assign the old data to frame 1.  A special case of this is for() loops, so
from Bill Venables via the archives:

/tmp <- -20:20/
/old.data <- data.frame(x = tmp, y = tmp + tmp^2 + rnorm(tmp))/
/new.x <- data.frame(x = 21)/
/for(i in 1:3) {/
>/+  ff <- lm(y ~ poly(x, i), old.data)/
>/+  print(predict.gam(ff, new.x))/
>/}/
Error in safe.predict.gam(object, newdata,..: Length of variable
        2 is 1 != length of row names (41)
Dumped

His solution:
/ for(i in 1:3) {/
/+   form <- substitute(y ~ poly(x, .i), list(.i = i))/
/+   ff <- lm(form, old.data)/
/+   print(predict.gam(ff, new.x))/
/+ }/
/[1] 161.64/
/[1] 462.64/
/[1] 463.56/

The second problem occurs when the gam(), glm(), or lm() command that
created object referred to in predict.gam(object, newdata) somehow changes
the dimensions of the original data.  The case I am certain of is using
na.action=na.omit.  The  solution is to groom your original data before
creating the gam object:

/original.data <- na.omit(original.data)/

I suspect that use of the `subscript=`  argument, or using subscripting
when specifying the data= argument in gam(), glm(), or lm() will also cause
the problem.

There may be other conditions that cause predict.gam to dump, but scoping
problems and data subscripting seem to be two of them.

************************************************************************
John Thaden, Ph.D., Instructor               jjthaden@life.uams.edu
Department of Geriatrics                     (501) 661-1202 x 2986
University of Arkansas for Medical Sciences  FAX: (501) 671-2510

      mail & ship to:  J. L. McClellan V.A. Medical Center
                       Research-151 (Room GB103, GC124)
                       4300 West 7th Street
                       Little Rock AR 72205 USA
************************************************************************
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news



--

Dr Eva Cantoni                 phone  : (+41) 22 379 8240
Econométrie - Univ. Genève     fax    : (+41) 22 379 8299
40, Bd du Pont d'Arve          e-mail : Eva.Cantoni@metri.unige.ch
CH-1211 Genève 4               http://www.unige.ch/ses/metri/cantoni


--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news


<Prev in Thread] Current Thread [Next in Thread>