Hi folks,
Suppose I have a data frame DF1 with columns "Count", "V1", "V2", and "V3".
I want to compute the sum of "Count" for each of the unique values for "V1",
"V2", and "V3". I have decided to use the by() function to do this as
follows:
by(DF1,DF1$"V1",colSums)
by(DF1,DF1$"V2",colSums)
by(DF1,DF1$"V3",colSums)
This works fine. However, as I have some data frames with unknown column
names, I would like to automate this process as follows:
for (i in 1:ncol(DF1)) {
colname <- names(DF1[i])
dfcolumn <- paste("DF1$",colname,sep="")
by(DF1,dfcolumn,colSums)
}
The problem with this code is that the by() function requires the second
parameter to be the INDICES (i.e., in my case the column of the data frame
DF1). What I've passed to the by() function is a character string
representation of these INDICES. So, the by() function call ends up
processing as: by(DF1,"DF1$V1",colSums) instead of by(DF1,DF1$V1,colSums).
Does anyone know how I could define "dfcolumn" in the for loop above so that
the by() function will be correct?
Thanks,
Reid Gilliam
|