On Thu, 30 Oct 2003 10:02:40 -0500
"Garrett, Robert" <garrett@NRCan.gc.ca> wrote:
> For many years I have been using a script for 'by' problems offered by
> Trevor Hastie and modified by Frank Harrell. I have searched for the
> original S-News traffic on this and could not find it, it does not surprise
> me that it is pre 1998 and before I started keeping my own archive of useful
> S-News text.
>
> I have a 'data processing' task (files up to 100,000 records) associated
> with preparing geochemical maps of Canada. To do several operations within
> one function I have taken the original 'by' function and modified it. I
> cannot get it to work, I know why, it is the whole problem of passing
> 'arguments' between frames. I have been working on this and have to admit I
> cannot solve it either using the 'by' function or lapply (and I think I
> really need the former), even with both V&R texts at hand. It is probably
> obvious to an experienced script writer, but that I am not. I am using S+
> 6.1 Professional under Win 2K.
>
> The original Hastie and Harrell function was by(group, exp, data) and my use
> of it stand alone would be, for example,
> by(Geol,summary1(Cu,paste(group)),ngr.lakes), where the
> summary1(Cu,paste(group)) is exp.
>
> The new script, ngr.by, is:
>
> function(group, x, file = NULL, exp, data = sys.parent())
> {
> #
> # Based on a function by Trevor Hastie, modified by F. Harrell for
> data omitted
> #
> if(is.null(file)) stop("\nMust supply file name for sink, e.g.,
> \"d:\\\\temp\\\\file\"")
> ngr.temp <- rg.ngr.ltdl.fix(x, zero2na = T)
> cat(" Variable", deparse(substitute(x)), "subset by",
> deparse(substitute(group)),
> "using:\n\t", deparse(substitute(exp)), "\n")
> sink(paste(file, ".txt", sep = ""))
>
> cat("Elem,Group,N,NA,Min,2%ile,5%ile,10%ile,25%ile,Median,7%ile,90%ile,95%il
> e,98%ile,Max,LCI,UCI,MAD,IQSD,Mean,SD,CV %"
> )
> on.exit(sink())
> #
> G <- substitute(group)
> exp <- substitute(exp)
> G <- factor(eval(G, local = data))
> for(group in levels(G)) {
> eval(exp, local = c(data[G == group, ], list(group =
> group)))
> }
> invisible()
> }
>
> The actual call I have been trying is ngr.by(Geol, Cu, "d:\\temp\\test",
> summary1(ngr.temp$x,paste(group)), ngr.lakes) The modification above does
> some preprocessing related to zeros, NAs and less that quantification level
> data stored as negative values, it is vectorized, fast and works as desired;
> then opens a file for output, that can be sucked straight into Excel with a
> first record header; and finally I want the function to run the 'by'
> operation to generate summary statistics for the Excel file that will be
> distributed to users.
>
> If anyone can point out my error(s) I would be most grateful of the
> assistance.
>
>
> Robert G. (Bob) Garrett Robert G. (Bob) Garrett
> Applied Geochemistry and Mineralogy Sous-division de la géochimie et
>
I was not an author of by. But you might check out the summarize function in
the Hmisc library to see if it helps.
---
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
|