s-news
[Top] [All Lists]

Re: tapply question

To: "Michael Slattery" <Michael.Slattery@epa.state.oh.us>
Subject: Re: tapply question
From: Tim Hesterberg <timh@insightful.com>
Date: Thu, 06 Mar 2008 08:28:55 -0800
Cc: <s-news@lists.biostat.wustl.edu>
In-reply-to: <47CFB658.C718.0085.0@epa.state.oh.us> (Michael.Slattery@epa.state.oh.us)
References: <47CFB658.C718.0085.0@epa.state.oh.us>
tapply is pretty limited -- its first argument must be a vector,
not a data frame.  I'd use split and lapply instead.

If I understand what you want, I'd start by
        split(data[-1], data$index)
to give you a list of three data frames.
For each you can use colSums:
        colSums(is.na(dataj))
        colSums(!is.na(dataj) & dataj=="non-detect")
You can operate on all three data frames at once using lapply or sapply.
Putting that together:
        lapply(split(data[-1], data$index),
               function(dataj){
                 cbind("NA"=colSums(is.na(dataj)),
                       "non-detect"=colSums(!is.na(dataj) & 
dataj=="non-detect"))
               })

Transcript and additional comment below.

-- Tim Hesterberg
I'll teach short courses:
  Advanced Programming in S-PLUS: San Antonio TX, March 26-27, 2008.
  Bootstrap Methods and Permutation Tests: San Antonio, March 28, 2008.
More info at http://www.insightful.com/Hesterberg


> data <- data.frame(index=c("A","C","B","B","A","C"),
+                    R1 = c("NA","non-detect","1.03","1.55","NA","non-detect"),
+                    R2 = c("345.6","non-detect","NA","NA","234.5","NA"))
> 
>       split(data[-1], data$index)
 split(data[-1], data$index)
$A:
  R1    R2 
1 NA 345.6
5 NA 234.5

$B:
    R1 R2 
3 1.03 NA
4 1.55 NA

$C:
          R1         R2 
2 non-detect non-detect
6 non-detect         NA

> 
> # example of operations on one of the data frames
> dataj <- split(data[-1], data$index)[[1]]
>       colSums(is.na(dataj))
 colSums(is.na(dataj))
 R1 R2 
  2  0
>       colSums(!is.na(dataj) & dataj=="non-detect")
 colSums(!is.na(dataj) & dataj=="non-detect")
 R1 R2 
  0  0
> 
>       lapply(split(data[-1], data$index),
 lapply(split(data[-1], data$index),
+                function(dataj){
+                  cbind("NA"=colSums(is.na(dataj)),
+                        "non-detect"=colSums(!is.na(dataj) & 
dataj=="non-detect"))
+                })
$A:
   NA non-detect 
R1  2          0
R2  0          0

$B:
   NA non-detect 
R1  0          0
R2  2          0

$C:
   NA non-detect 
R1  0          2
R2  1          1

Additional comment:  you can turn that last result into a 3-way
array using
        unlist(that result, dim = c(2,2,3), dimnames = etc.)

>Hello All, 
>I have a tapply problem, which I thought was an easy one, but the solution has 
>nonetheless eluded me (more coffee?).
> 
>My dataframe structure is this:
> 
>index          R1                         R2......................R40         
> 
>A                NA                         345.6
>C               non-detect               non-detect 
>B                1.03                       NA 
>B                1.55                       NA
>A               NA                          234.5
>C               non-detect               NA
>.
>.
>. 
> 
>What I need are simultaneous counts of 1) NA's and 2) non-detect's for each 
>column Rx, for each index (n=3), across some 40 columns and some 150K records. 
>My idea is to cbind the results of this query into a dataframe for analysis. I 
>just can't seem to get the correct syntax. 
> 
>I should say that I have tried flipping between a datatype of factor and 
>character for both the index and the Rx's, but that hasn't helped me, probably 
>since I don't have the syntax of the command correct yet. 
> 
>A further question is, then, what is the (more?) correct datatype (factor or 
>char) for both index and data columns, for proper input into tapply?
> 
>I hope I am clear enough with my explanation.
> 
>best regards, 
>Mike Slattery
>
>
>
>Michael W. Slattery
>Geologist, Ohio EPA
>50 West Town Street, Suite 700
>Columbus OH, 43215
>michael.slattery@epa.state.oh.us
>614-728-1221 (Ph)
>614-644-2909 (Fax)

<Prev in Thread] Current Thread [Next in Thread>