There is assuredly a simple way to do what I have been trying unsuccessfully
to do from scratch:
I have a large data.frame with 65,770 rows and 18 columns. One of the
variables is an indicator where 0 indicates "response not applicable for
analysis"
and 1 indicates "response applicable for analysis".
For a given Column1 - Column2 combination (there are 9 unique values in
Column1
and 41 unique values in Column 2 for a total of 369 combinations) I want to
test whether
or not the total number of applicable responses is at least 30. If so, then
I want to
append/add/cbind/*whatever* the entire set of records associated with that
Column1 - Column2 combination to a different data.frame. Here is my current
(and not working) code:
col1 <- as.vector(unique(data.frame$Col1))
col2 <- as.vector(unique(data.frame$Col2))
sampsize.OK <- matrix(data=NA,nrow=1,ncol=18)
for(i in 1:length(col1)){
for(j in 1:length(col2)){
temp <- data.frame[data.frame$Col1 == col1[i]
&
data.frame$Col2 == col2[j],];
if(sum(temp$indicator) >= 30){
sampsize.OK <- rbind(sampsize.OK,as.matrix(temp))
}
}
}
When I use this code, I get the error
Problem in dim(x) <- c(n, length(collabs)): Length of data (14319) doesn't
match product of dimensions (15156)
Use traceback() to see the call stack
I have checked the number of rows columns for "temp" and they are
appropriate for the
dimensions of sampsize.OK. The problem, when I use the traceback()
function, is that
as.matrix( ) is not able for some reason to coerce my temp data frame into a
matrix.
The columns in my original data.frame consist of a mixture of factors and
numeric
variables, and I suspect that this is the cause of my problem but I don't
know how to fix
it.
Any insights as to why my code is not working would be greatly appreciated.
I would
also appreciate alternative methods for doing the same thing.
Thanks in advance,
David Paul, Ph.D.
Battelle Memorial Institute
505 King Avenue
Columbus, OH 43201
614.424.3176
|