I think of the "tapply" function returning a (simplified) list structure.
Typically this is a single mode structure for single value returns (eg.
vector, matrix, array depending on # of "by" variables). For multiple
returns, it will return a list structure (eg. try
"tapply(fuel.frame$Mileage, fuel.frame$Type, range)").
> data
group x
1 treatment_1 -0.8514052
2 treatment_2 -0.2327822
3 treatment_1 0.3106438
4 treatment_2 -0.5681113
5 treatment_1 -1.2298195
6 treatment_2 -0.1028086
7 treatment_1 0.8201559
8 treatment_2 -0.9596621
> tapply(data$x, data$group, mean) # Returns a vector
treatment_1 treatment_2
-0.2376172 -0.465841
> tapply(data$x, data$group, range) # Returns a list
$"treatment_1":
[1] -1.2298195 0.8201559
$"treatment_2":
[1] -0.9596621 -0.1028086
The aggregate function will return a data frame. However, the aggregate
function can only return a single value:
> aggregate(data$x, data$group, mean)
Group x
treatment_1 treatment_1 -0.2376172
treatment_2 treatment_2 -0.4658410
So, whether you use tapply or aggregate on your data, you'll still need to
do some manipulation of the results.
Note: There is a small bug in aggregate (fixed in S+7), where the ellipses
arguments are not passed through to the function call.
Cheers,
Rich.
mangosolutions
-----Original Message-----
From: Wensui Liu [mailto:liuwensui@gmail.com]
Sent: 14 March 2007 23:07
To: Neung-Hwan Oh
Cc: Rich@mango-solutions.com; s-news@wubios.wustl.edu
Subject: Re: [S] group by
following is an example I copied from my blog and HTH.
CALCULATE GROUP SUMMARY IN R
##################################################
# HOW TO CALCULATE GROUP SUMMARY IN R #
##################################################
# EQUIVALENT SAS CODE: #
# #
# DATA DATA; #
# DO I = 1 TO 2; #
# DO J = 1 TO 4; #
# GROUP = 'TREATMENT_'||PUT(I, 1.); #
# X = RANNOR(1); #
# OUTPUT; #
# END; #
# END; #
# KEEP GROUP X; #
# RUN; #
# #
# PROC SQL; #
# CREATE TABLE COMBINE AS #
# SELECT *, MEAN(X) AS MEAN_X, SUM(X) AS SUM_X #
# FROM DATA #
# GROUP BY GROUP; #
# QUIT; #
##################################################
# GENERATE A TREATMENT GROUP #
group<-as.factor(paste("treatment", rep(1:2, 4), sep = '_'));
# CREATE A SERIES OF RANDOM VALUES #
x<-rnorm(length(group));
# CREATE A DATA FRAME TO COMBINE THE ABOVE TWO #
data<-data.frame(group, x);
# CALCULATE SUMMARY FOR X #
x.mean<-tapply(data$x, data$group, mean, na.rm = T);
x.sum<-tapply(data$x, data$group, sum, na.rm = T);
# CREATE A DATA FRAME TO COMBINE SUMMARIES #
summ<-data.frame(x.mean, x.sum, group = names(x.mean));
# COMBINE DATA AND SUMMARIES TOGETHER #
combine<-merge(data, summ, by = "group");
On 3/14/07, Neung-Hwan Oh <ultisol@gmail.com> wrote:
> Thanks a lot for the quick replies.
> But, doesn't "tapply" or "aggregate" provide a matrix format rather
> than data frame format?
>
> On 3/14/07, Rich@mango-solutions.com <Rich@mango-solutions.com> wrote:
> > Sorry ...
> >
> > I mean "aggregate(df$value, df[,1:3], mean)"
> >
> > Rich.
> > mangosolutions
> >
> > -----Original Message-----
> > From: Rich@Mango-Solutions.com [mailto:Rich@Mango-Solutions.com]
> > Sent: 14 March 2007 21:14
> > To: 'Wensui Liu'; 'Neung-Hwan Oh'
> > Cc: s-news@wubios.wustl.edu
> > Subject: Re: [S] group by
> >
> > Yes ... you can use tapply in S. For the structure you're using, I'd
> > probably recommend "aggregate" though ...
> >
> > Something like "aggregate(df[,1:3], df$value, mean)"
> >
> > Rich.
> > mangosolutions
> >
> > -----Original Message-----
> > From: Wensui Liu [mailto:liuwensui@gmail.com]
> > Sent: 14 March 2007 21:02
> > To: Neung-Hwan Oh
> > Cc: s-news@wubios.wustl.edu
> > Subject: Re: [S] group by
> >
> > I am not sure if in Splus, there is a nice function like tapply() in R
or
> > not.
> >
> > On 3/14/07, Neung-Hwan Oh <ultisol@gmail.com> wrote:
> > > Hello,
> > >
> > > How can you calculate the following example in s-plus? In Access, it
is
> > > relatively easy with "Group By" and I am wondering whether there is a
> > > similar function that I missed in S-Plus.
> > >
> > >
> > >
> > > "From this table"
> > >
> > > site.no date time value
> > >
> > > 1 1989/04/27 12:00 1.0
> > >
> > > 2 1975/10/01 19:00 2.0
> > >
> > > 2 1975/10/01 20:00 4.0
> > >
> > > 3 1993/04/10 09:00 3.0
> > >
> > > 3 1993/04/10 12:00 6.0
> > >
> > > 3 1993/04/10 15:00 9.0
> > >
> > >
> > >
> > > "To this (averages per date per site)" + (count column?)
> > >
> > > 1 1989/04/27 12:00 1.0 (1.0)
> > >
> > > 2 1975/10/01 19:30 3.0 (2.0)
> > >
> > > 3 1993/04/10 12:00 6.0 (3.0)
> > >
> > >
> > >
> > > Many thanks!
> > >
> > > NH
> >
> >
> > --
> > WenSui Liu
> > A lousy statistician who happens to know a little programming
> > (http://spaces.msn.com/statcompute/blog)
> > --------------------------------------------------------------------
> > This message was distributed by s-news@lists.biostat.wustl.edu. To
> > unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> > the BODY of the message: unsubscribe s-news
> >
> > --------------------------------------------------------------------
> > This message was distributed by s-news@lists.biostat.wustl.edu. To
> > unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> > the BODY of the message: unsubscribe s-news
> >
> > --------------------------------------------------------------------
> > This message was distributed by s-news@lists.biostat.wustl.edu. To
> > unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> > the BODY of the message: unsubscribe s-news
> >
>
--
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
|