Brian;
Bingo: I am using S+ 6.1 on Windows (I keep forgetting to mention that) and
using data.frame(x,row.names=y, dup.row.names=T) to create non-unique
row.names. One of the columns is numeric, the others are text but are
duplicated just like the row names. I've found some interesting side
effects when you set dup.row.names=T in a data.frame.
Dimitris suggested:
dat <- data.frame(names=sample(rep(LETTERS[1:20], 10000)),
values=sample(rep(1:20, 10000), rep=T))
#########
means <- lapply(split(dat$val, dat$nam), function(x)
sum(x)/length(x))
dat3 <- data.frame(values=unlist(means))
dat3
0.68sec,6109200 bytes.
Brian Ripley:
nm <- unique(rows)
res <- as.matrix(df[nm, ])
for(i in 1:length(nm)) res[i, ] <- mean(df[rows==nm[i], ])
cnt <- table(rows); csum <- cumsum(cnt)
new <- df[nm, ] # to get the row and column labels right
new[] <- lapply(df, function(x) diff(c(0, cumsum(x)[csum]))/cnt)
I am currently testing both for memory use, both differ from my original
attempt so I appreciate the suggestions.
Thank you for the help;
Phillip
_________________________________________________________________
STOP MORE SPAM with the new MSN 8 and get 2 months FREE*
http://join.msn.com/?page=features/junkmail
|