s-news
[Top] [All Lists]

Splitting off noncontiguous data.frame subsets

To: "'s-news@lists.biostat.wustl.edu'" <s-news@lists.biostat.wustl.edu>
Subject: Splitting off noncontiguous data.frame subsets
From: "Paul, David A" <paulda@BATTELLE.ORG>
Date: Thu, 31 Oct 2002 10:32:49 -0500
There is assuredly a simple way to do what I have been trying unsuccessfully

to do from scratch:

I have a large data.frame with 65,770 rows and 18 columns.  One of the 
variables is an indicator where 0 indicates  "response not applicable for
analysis"
and 1 indicates "response applicable for analysis".

For a given Column1 - Column2 combination (there are 9 unique values in
Column1
and 41 unique values in Column 2 for a total of 369 combinations) I want to
test whether 
or not the total number of applicable responses is at least 30.  If so, then
I want to 
append/add/cbind/*whatever* the entire set of records associated with that 
Column1 - Column2 combination to a different data.frame.  Here is my current

(and not working) code:

col1 <- as.vector(unique(data.frame$Col1))
col2 <- as.vector(unique(data.frame$Col2))
sampsize.OK <- matrix(data=NA,nrow=1,ncol=18)
for(i in 1:length(col1)){
        for(j in 1:length(col2)){
                temp <- data.frame[data.frame$Col1 == col1[i] 
                                                                &
data.frame$Col2 == col2[j],];
                if(sum(temp$indicator) >= 30){
                        sampsize.OK <- rbind(sampsize.OK,as.matrix(temp))
                }
        }
}

When I use this code, I get the error 

Problem in dim(x) <- c(n, length(collabs)): Length of data (14319) doesn't 
match product of dimensions (15156) 
Use traceback() to see the call stack

I have checked the number of rows columns for "temp" and they are
appropriate for the
dimensions of sampsize.OK.  The problem, when I use the traceback()
function, is that
as.matrix( ) is not able for some reason to coerce my temp data frame into a
matrix. 
The columns in my original data.frame consist of a mixture of factors and
numeric 
variables, and I suspect that this is the cause of my problem but I don't
know how to fix
it.

Any insights as to why my code is not working would be greatly appreciated.
I would
also appreciate alternative methods for doing the same thing.

Thanks in advance,

 David Paul, Ph.D.
  Battelle Memorial Institute
  505 King Avenue
  Columbus, OH  43201
  614.424.3176


<Prev in Thread] Current Thread [Next in Thread>
  • Splitting off noncontiguous data.frame subsets, Paul, David A <=