tapply is pretty limited -- its first argument must be a vector,
not a data frame. I'd use split and lapply instead.
If I understand what you want, I'd start by
split(data[-1], data$index)
to give you a list of three data frames.
For each you can use colSums:
colSums(is.na(dataj))
colSums(!is.na(dataj) & dataj=="non-detect")
You can operate on all three data frames at once using lapply or sapply.
Putting that together:
lapply(split(data[-1], data$index),
function(dataj){
cbind("NA"=colSums(is.na(dataj)),
"non-detect"=colSums(!is.na(dataj) &
dataj=="non-detect"))
})
Transcript and additional comment below.
-- Tim Hesterberg
I'll teach short courses:
Advanced Programming in S-PLUS: San Antonio TX, March 26-27, 2008.
Bootstrap Methods and Permutation Tests: San Antonio, March 28, 2008.
More info at http://www.insightful.com/Hesterberg
> data <- data.frame(index=c("A","C","B","B","A","C"),
+ R1 = c("NA","non-detect","1.03","1.55","NA","non-detect"),
+ R2 = c("345.6","non-detect","NA","NA","234.5","NA"))
>
> split(data[-1], data$index)
split(data[-1], data$index)
$A:
R1 R2
1 NA 345.6
5 NA 234.5
$B:
R1 R2
3 1.03 NA
4 1.55 NA
$C:
R1 R2
2 non-detect non-detect
6 non-detect NA
>
> # example of operations on one of the data frames
> dataj <- split(data[-1], data$index)[[1]]
> colSums(is.na(dataj))
colSums(is.na(dataj))
R1 R2
2 0
> colSums(!is.na(dataj) & dataj=="non-detect")
colSums(!is.na(dataj) & dataj=="non-detect")
R1 R2
0 0
>
> lapply(split(data[-1], data$index),
lapply(split(data[-1], data$index),
+ function(dataj){
+ cbind("NA"=colSums(is.na(dataj)),
+ "non-detect"=colSums(!is.na(dataj) &
dataj=="non-detect"))
+ })
$A:
NA non-detect
R1 2 0
R2 0 0
$B:
NA non-detect
R1 0 0
R2 2 0
$C:
NA non-detect
R1 0 2
R2 1 1
Additional comment: you can turn that last result into a 3-way
array using
unlist(that result, dim = c(2,2,3), dimnames = etc.)
>Hello All,
>I have a tapply problem, which I thought was an easy one, but the solution has
>nonetheless eluded me (more coffee?).
>
>My dataframe structure is this:
>
>index R1 R2......................R40
>
>A NA 345.6
>C non-detect non-detect
>B 1.03 NA
>B 1.55 NA
>A NA 234.5
>C non-detect NA
>.
>.
>.
>
>What I need are simultaneous counts of 1) NA's and 2) non-detect's for each
>column Rx, for each index (n=3), across some 40 columns and some 150K records.
>My idea is to cbind the results of this query into a dataframe for analysis. I
>just can't seem to get the correct syntax.
>
>I should say that I have tried flipping between a datatype of factor and
>character for both the index and the Rx's, but that hasn't helped me, probably
>since I don't have the syntax of the command correct yet.
>
>A further question is, then, what is the (more?) correct datatype (factor or
>char) for both index and data columns, for proper input into tapply?
>
>I hope I am clear enough with my explanation.
>
>best regards,
>Mike Slattery
>
>
>
>Michael W. Slattery
>Geologist, Ohio EPA
>50 West Town Street, Suite 700
>Columbus OH, 43215
>michael.slattery@epa.state.oh.us
>614-728-1221 (Ph)
>614-644-2909 (Fax)
|