s-news
[Top] [All Lists]

summary: collapse to unique row.names using mean()

To: ripley@stats.ox.ac.uk
Subject: summary: collapse to unique row.names using mean()
From: "Phillip Staford" <biomining@hotmail.com>
Date: Sun, 19 Jan 2003 12:25:41 -0700
Cc: s-news@lists.biostat.wustl.edu
Brian;

Bingo: I am using S+ 6.1 on Windows (I keep forgetting to mention that) and using data.frame(x,row.names=y, dup.row.names=T) to create non-unique row.names. One of the columns is numeric, the others are text but are duplicated just like the row names. I've found some interesting side effects when you set dup.row.names=T in a data.frame.


Dimitris suggested:

dat <- data.frame(names=sample(rep(LETTERS[1:20], 10000)),
values=sample(rep(1:20, 10000), rep=T))
#########
means <- lapply(split(dat$val, dat$nam), function(x)
sum(x)/length(x))
dat3 <- data.frame(values=unlist(means))
dat3

0.68sec,6109200 bytes.




Brian Ripley:

nm <- unique(rows)
res <- as.matrix(df[nm, ])
for(i in 1:length(nm))  res[i, ] <- mean(df[rows==nm[i], ])

cnt <- table(rows); csum <- cumsum(cnt)
new <- df[nm, ]  # to get the row and column labels right
new[] <- lapply(df, function(x) diff(c(0, cumsum(x)[csum]))/cnt)


I am currently testing both for memory use, both differ from my original attempt so I appreciate the suggestions.


Thank you for the help;

Phillip

_________________________________________________________________
STOP MORE SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail


<Prev in Thread] Current Thread [Next in Thread>
  • summary: collapse to unique row.names using mean(), Phillip Staford <=