s-news
[Top] [All Lists]

Re: Writing a function taking a data.frame column name as a parameter

To: "Hunsicker, Lawrence" <lawrence-hunsicker@uiowa.edu>
Subject: Re: Writing a function taking a data.frame column name as a parameter
From: Tim Hesterberg <TimHesterberg@gmail.com>
Date: Fri, 26 Dec 2008 21:56:14 -0800
Cc: <s-news@lists.biostat.wustl.edu>
In-reply-to: <2B80F69A8A189D48B0E668B0BBC6BA4201E01111@HC-MAIL13.healthcare.uiowa.edu> (lawrence-hunsicker@uiowa.edu)
References: <2B80F69A8A189D48B0E668B0BBC6BA4201E01111@HC-MAIL13.healthcare.uiowa.edu>
Reply-to: TimHesterberg@gmail.com (Tim Hesterberg)
First,
  working7$x
is NULL, because working7 doesn't have a variable named "x".
Instead use:
  working7[[x]]

Later, your formula
  anyCVD ~ x
won't do what you want, again because the data frame doesn't have
a variable named "x".  To replace "x" with the
variable name, you could
(a) use substitute()
(b) create a text version of the call to glmmPQL using paste(),
    followed by eval(parse(the text version))
Neither of these is straightforward.  Here is a short
example using eval and parse:
  eval(parse(text = "3+2")[[1]])

Finally, if glmmPQL is a typical modeling function, you
will probably run into scoping problems.  At bottom is my stock
response to scoping questions.

Tim Hesterberg

>Hi again, folks, and happy Boxing Day to all:
>
>I have written a script that works:
>
>temp <- working7[!is.na(working7$Sexe),]
>any.glmm<-glmmPQL(fixed = anyCVD ~ Sexe
>       , random = ~1|Region/Pays/Center, family = binomial(link=logit),
>data = temp)
>list("Sexe", anova(any.glmm), any.glmm$coefficients$fixed)
>
>I now want to automate this script to run with a number of columns in a
>data frame with over 100 columns.  It would be nice to set this up as a
>function that I could apply with sapply.   I have tried to define the
>function as follows:
>
>testfx <- function(x) {
>temp <- working7[!is.na(working7$x),]
>any.glmm<-glmmPQL(fixed = anyCVD ~ x
>               , random = ~1|Region/Pays/Center, family =
>binomial(link=logit), data = temp)
>list(x, anova(any.glmm), any.glmm$coefficients$fixed)
>
>This compiles.  But when I try to run the functio entering a column name
>in the place of x 9e.g., testfx("Sexe"), it always abends with the
>message:
>
>Ethnie.glmm <- testfx("Ethnie")
>Problem in testfx("Ethnie"): Length of anyCVD (variable 1) is 0 !=
>length of others (10)
>
>Same if I enter the column name  unquoted, or use an index.  
>
>What am I doing wrong?  How does one enter a column name (or column
>number) as a parameter in a self-defined function?
>
>As always, many thanks to anyone that can help me here.
>
>Larry Hunsicker

In S-PLUS, objects defined inside a function are local to that
function.  If you get an 'Object "nameOfObject" not found' message,
you've run into this.

For example, here the variable 'a' in 'Parent' is local:
    rm(a) # make sure there is no copy
    Parent <- function(x){
      a <- x
      Child()
    }
    Child <- function(){
      a
    }
    Parent(7)  # Fails, 'Object "a" not found'

You may get unexpected results if there is a copy of the object lying around:
    a <- -999  # suppose this value was created previously and not removed
    Parent(7)  # The result is -999, not 7.

Here are two workarounds:
(1) 'Child' could use
        get("a", frame = sys.parent())
to get the copy of "a" from its parent function.

(2) 'Parent' could save a copy in a globally-visible location:
    Parent <- function(x){
      a <- x
      assign("a", a, frame=1)
      Child()
    }

The problem occurs most often when you write a "Parent" function that
calls a statistical modeling function like 'lm'.  In this case
workaround (2) is appropriate.


--------------------------------------------------
Background and details:

The objects associated with a function are stored in its "frame".
There are also two global frames, stored in memory.  S-PLUS also
searches on global databases, stored on disk.  Hence, when 'Child'
asks for an object, the search path is:

frame 3         Child's frame
(frame 2)           (Parent's frame -- is not visible to Child)
frame 1         "expression frame" (global)
frame 0         "session frame" (global)
database 1      working database, objects you create are stored here
database 2...   other databases - "splus", "stat", etc.

`Child' looks for objects in 
(1) it's own frame
    (any objects already defined inside Child)
(2) frame 1
    (the "expression frame", a temporary frame in memory -- this lasts
    until the current top-level command finishes)
(3) frame 0
    (the "session frame", a temporary frame in memory that persists
    between commands but disappears when you exit S-PLUS)
(4) databases

'Child' does not look in anywhere between frame 1 and it's own frame.
In other words, if you define something inside 'Parent', it is invisible
to 'Child'.

In the above example, 'Parent' has frame 2 and 'Child' has frame 3.
If `Child' in turn calls `Grandchild', that would be in frame 4.

A frame is essentially a list of objects that a function has
defined for its own use (but frame 0 and frame 1 are for global use).
For example, after "a <- x", then Parent's frame contains 'a' and 'x'.
Use assign() if you want to create a copy of 'a' somewhere else,
either in memory (frame 0 or higher) or on disk (databases).

The rules about where objects are searched for are known as "scoping
rules".  For additional information, see
* S-PLUS Programmer's Guide, section "Matching Names and Values",
* Becker, Chambers & Wilks "The New S Language" (the "blue book") pp 118-121
* Venables & Ripley "S Programming" pp 54-65.

<Prev in Thread] Current Thread [Next in Thread>