s-news
[Top] [All Lists]

Re: A 'by' problem

To: "Garrett, Robert" <garrett@NRCan.gc.ca>
Subject: Re: A 'by' problem
From: Frank E Harrell Jr <feh3k@spamcop.net>
Date: Thu, 30 Oct 2003 10:30:47 -0500
Cc: s-news@lists.biostat.wustl.edu
In-reply-to: <3E3C279AF3F9D411BAA00002A529150E0664AA42@S0-OTT-X10.NRCan.gc.ca>
Organization: Vanderbilt University
References: <3E3C279AF3F9D411BAA00002A529150E0664AA42@S0-OTT-X10.NRCan.gc.ca>
On Thu, 30 Oct 2003 10:02:40 -0500
"Garrett, Robert" <garrett@NRCan.gc.ca> wrote:

> For many years I have been using a script for 'by' problems offered by
> Trevor Hastie and modified by Frank Harrell.  I have searched for the
> original S-News traffic on this and could not find it, it does not surprise
> me that it is pre 1998 and before I started keeping my own archive of useful
> S-News text.
> 
> I have a 'data processing' task (files up to 100,000 records) associated
> with preparing geochemical maps of Canada.  To do several operations within
> one function I have taken the original 'by' function and modified it.  I
> cannot get it to work, I know why, it is the whole problem of passing
> 'arguments' between frames.  I have been working on this and have to admit I
> cannot solve it either using the 'by' function or lapply (and I think I
> really need the former), even with both V&R texts at hand.  It is probably
> obvious to an experienced script writer, but that I am not.  I am using S+
> 6.1 Professional under Win 2K.
> 
> The original Hastie and Harrell function was by(group, exp, data) and my use
> of it stand alone would be, for example,
> by(Geol,summary1(Cu,paste(group)),ngr.lakes), where the
> summary1(Cu,paste(group)) is exp.
> 
> The new script, ngr.by, is:
> 
> function(group, x, file = NULL, exp, data = sys.parent())
> {
>       #
>       # Based on a function by Trevor Hastie, modified by F. Harrell for
> data omitted
>       #
>       if(is.null(file)) stop("\nMust supply file name for sink, e.g.,
> \"d:\\\\temp\\\\file\"")
>       ngr.temp <- rg.ngr.ltdl.fix(x, zero2na = T)
>       cat("  Variable", deparse(substitute(x)), "subset by",
> deparse(substitute(group)), 
>               "using:\n\t", deparse(substitute(exp)), "\n")
>       sink(paste(file, ".txt", sep = ""))
>       
> cat("Elem,Group,N,NA,Min,2%ile,5%ile,10%ile,25%ile,Median,7%ile,90%ile,95%il
> e,98%ile,Max,LCI,UCI,MAD,IQSD,Mean,SD,CV %"
>               )
>       on.exit(sink())
>       #
>       G <- substitute(group)
>       exp <- substitute(exp)
>       G <- factor(eval(G, local = data))
>       for(group in levels(G)) {
>               eval(exp, local = c(data[G == group,  ], list(group =
> group)))
>       }
>       invisible()
> }
> 
> The actual call I have been trying is ngr.by(Geol, Cu, "d:\\temp\\test",
> summary1(ngr.temp$x,paste(group)), ngr.lakes)  The modification above does
> some preprocessing related to zeros, NAs and less that quantification level
> data stored as negative values, it is vectorized, fast and works as desired;
> then opens a file for output, that can be sucked straight into Excel with a
> first record header; and finally I want the function to run the 'by'
> operation to generate summary statistics for the Excel file that will be
> distributed to users.
> 
> If anyone can point out my error(s) I would be most grateful of the
> assistance.
> 
> 
> Robert G. (Bob) Garrett                       Robert G. (Bob) Garrett
> Applied Geochemistry and Mineralogy   Sous-division de la géochimie et
> 

I was not an author of by.  But you might check out the summarize function in 
the Hmisc library to see if it helps.

---
Frank E Harrell Jr    Professor and Chair            School of Medicine
                      Department of Biostatistics    Vanderbilt University

<Prev in Thread] Current Thread [Next in Thread>
  • A 'by' problem, Garrett, Robert
    • Re: A 'by' problem, Frank E Harrell Jr <=