For many years I have been using a script for 'by' problems offered by
Trevor Hastie and modified by Frank Harrell. I have searched for the
original S-News traffic on this and could not find it, it does not surprise
me that it is pre 1998 and before I started keeping my own archive of useful
S-News text.
I have a 'data processing' task (files up to 100,000 records) associated
with preparing geochemical maps of Canada. To do several operations within
one function I have taken the original 'by' function and modified it. I
cannot get it to work, I know why, it is the whole problem of passing
'arguments' between frames. I have been working on this and have to admit I
cannot solve it either using the 'by' function or lapply (and I think I
really need the former), even with both V&R texts at hand. It is probably
obvious to an experienced script writer, but that I am not. I am using S+
6.1 Professional under Win 2K.
The original Hastie and Harrell function was by(group, exp, data) and my use
of it stand alone would be, for example,
by(Geol,summary1(Cu,paste(group)),ngr.lakes), where the
summary1(Cu,paste(group)) is exp.
The new script, ngr.by, is:
function(group, x, file = NULL, exp, data = sys.parent())
{
#
# Based on a function by Trevor Hastie, modified by F. Harrell for
data omitted
#
if(is.null(file)) stop("\nMust supply file name for sink, e.g.,
\"d:\\\\temp\\\\file\"")
ngr.temp <- rg.ngr.ltdl.fix(x, zero2na = T)
cat(" Variable", deparse(substitute(x)), "subset by",
deparse(substitute(group)),
"using:\n\t", deparse(substitute(exp)), "\n")
sink(paste(file, ".txt", sep = ""))
cat("Elem,Group,N,NA,Min,2%ile,5%ile,10%ile,25%ile,Median,7%ile,90%ile,95%il
e,98%ile,Max,LCI,UCI,MAD,IQSD,Mean,SD,CV %"
)
on.exit(sink())
#
G <- substitute(group)
exp <- substitute(exp)
G <- factor(eval(G, local = data))
for(group in levels(G)) {
eval(exp, local = c(data[G == group, ], list(group =
group)))
}
invisible()
}
The actual call I have been trying is ngr.by(Geol, Cu, "d:\\temp\\test",
summary1(ngr.temp$x,paste(group)), ngr.lakes) The modification above does
some preprocessing related to zeros, NAs and less that quantification level
data stored as negative values, it is vectorized, fast and works as desired;
then opens a file for output, that can be sucked straight into Excel with a
first record header; and finally I want the function to run the 'by'
operation to generate summary statistics for the Excel file that will be
distributed to users.
If anyone can point out my error(s) I would be most grateful of the
assistance.
Robert G. (Bob) Garrett Robert G. (Bob) Garrett
Applied Geochemistry and Mineralogy Sous-division de la géochimie et
Subdivision minéralogie
Geological Survey of Canada Commission géologique du Canada
Natural Resources Canada Ressources naturelles Canada
601 Booth St., Ottawa, Ontario K1A 0E8 601 rue Booth, Ottawa (Ontario) K1A
0E8
Internet: garrett@gsc.NRCan.gc.ca Internet: garrett@gsc.NRCan.gc.ca
Tel.: 613+995-4517 FAX: 613+996-3726 Tel.: 613+995-4517 FAX: 613+996-3726
----------------------------------------------------------------------------
---
|