s-news
[Top] [All Lists]

Re: group by

To: "Neung-Hwan Oh" <ultisol@gmail.com>
Subject: Re: group by
From: "Wensui Liu" <liuwensui@gmail.com>
Date: Wed, 14 Mar 2007 19:07:18 -0400
Cc: Rich@mango-solutions.com, s-news@wubios.wustl.edu
Dkim-signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=ZY2VpkroOQo0e0oKTZQ1ee5uqq624V8v6WhdyBm6aKUSAX41CJ7Y+86G6iG4pncuWtBaAcrdHgZ3qnv+rlQSo9knAHTLRVJxZqXzIVaOr7hHdWKLaiBMy3RgroO+AlvEXz5K05UGIpzsdVfyDvqljixmPp9eVi4HUVli8WUQVXs=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=k+rcpc2tSuzcgBkJGnpXfHOt6y9sX9Jsj9VjQORvR1Xl2anNEi2o2LFxYzFlNjV0Ib41iQZVDhU+JeraGBNah/JbuE0SRJVjAC6Vs0aXD64LeHRG0UXCU41lUQbNju/xMth352oUOWfc0FJ3uuGeUUMvebsHVzO42no85FuEd44=
In-reply-to: <67249fb50703141558k9bfc6bfqca0b37fadd53696d@mail.gmail.com>
References: <20070314211413.E237E137456@mailgate.biostat.wustl.edu> <20070314211746.3443D136DCF@mailgate.biostat.wustl.edu> <67249fb50703141558k9bfc6bfqca0b37fadd53696d@mail.gmail.com>
following is an example I copied from my blog and HTH.

CALCULATE GROUP SUMMARY IN R

##################################################
# HOW TO CALCULATE GROUP SUMMARY IN R            #
##################################################
# EQUIVALENT SAS CODE:                           #
#                                                #
# DATA DATA;                                     #
#   DO I = 1 TO 2;                               #
#     DO J = 1 TO 4;                             #
#       GROUP = 'TREATMENT_'||PUT(I, 1.);        #
#       X = RANNOR(1);                           #
#       OUTPUT;                                  #
#     END;                                       #
#   END;                                         #
#   KEEP GROUP X;                                #
# RUN;                                           #
#                                                #
# PROC SQL;                                      #
#   CREATE TABLE COMBINE AS                      #
#   SELECT *, MEAN(X) AS MEAN_X, SUM(X) AS SUM_X #
#   FROM DATA                                    #
#   GROUP BY GROUP;                              #
# QUIT;                                          #
##################################################

# GENERATE A TREATMENT GROUP #
group<-as.factor(paste("treatment", rep(1:2, 4), sep = '_'));

# CREATE A SERIES OF RANDOM VALUES #
x<-rnorm(length(group));

# CREATE A DATA FRAME TO COMBINE THE ABOVE TWO #
data<-data.frame(group, x);

# CALCULATE SUMMARY FOR X #
x.mean<-tapply(data$x, data$group, mean, na.rm = T);
x.sum<-tapply(data$x, data$group, sum, na.rm = T);

# CREATE A DATA FRAME TO COMBINE SUMMARIES #
summ<-data.frame(x.mean, x.sum, group = names(x.mean));

# COMBINE DATA AND SUMMARIES TOGETHER #
combine<-merge(data, summ, by = "group");

On 3/14/07, Neung-Hwan Oh <ultisol@gmail.com> wrote:
Thanks a lot for the quick replies.
But, doesn't "tapply" or "aggregate" provide a matrix format rather
than data frame format?

On 3/14/07, Rich@mango-solutions.com <Rich@mango-solutions.com> wrote:
> Sorry ...
>
> I mean "aggregate(df$value, df[,1:3], mean)"
>
> Rich.
> mangosolutions
>
> -----Original Message-----
> From: Rich@Mango-Solutions.com [mailto:Rich@Mango-Solutions.com]
> Sent: 14 March 2007 21:14
> To: 'Wensui Liu'; 'Neung-Hwan Oh'
> Cc: s-news@wubios.wustl.edu
> Subject: Re: [S] group by
>
> Yes ... you can use tapply in S.  For the structure you're using, I'd
> probably recommend "aggregate" though ...
>
> Something like "aggregate(df[,1:3], df$value, mean)"
>
> Rich.
> mangosolutions
>
> -----Original Message-----
> From: Wensui Liu [mailto:liuwensui@gmail.com]
> Sent: 14 March 2007 21:02
> To: Neung-Hwan Oh
> Cc: s-news@wubios.wustl.edu
> Subject: Re: [S] group by
>
> I am not sure if in Splus, there is a nice function like tapply() in R or
> not.
>
> On 3/14/07, Neung-Hwan Oh <ultisol@gmail.com> wrote:
> > Hello,
> >
> > How can you calculate the following example in s-plus?   In Access, it is
> > relatively easy with "Group By" and I am wondering whether there is a
> > similar function that I missed in S-Plus.
> >
> >
> >
> > "From this table"
> >
> > site.no date time value
> >
> > 1 1989/04/27 12:00   1.0
> >
> > 2 1975/10/01 19:00   2.0
> >
> > 2 1975/10/01 20:00   4.0
> >
> > 3 1993/04/10 09:00   3.0
> >
> > 3 1993/04/10 12:00   6.0
> >
> > 3 1993/04/10 15:00   9.0
> >
> >
> >
> > "To this (averages per date per site)" + (count column?)
> >
> > 1 1989/04/27 12:00 1.0  (1.0)
> >
> > 2 1975/10/01 19:30 3.0  (2.0)
> >
> > 3 1993/04/10 12:00 6.0  (3.0)
> >
> >
> >
> > Many thanks!
> >
> > NH
>
>
> --
> WenSui Liu
> A lousy statistician who happens to know a little programming
> (http://spaces.msn.com/statcompute/blog)
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
>
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
>
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
>



--
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

<Prev in Thread] Current Thread [Next in Thread>