s-news
[Top] [All Lists]

Using aggregate

To: <s-news@lists.biostat.wustl.edu>
Subject: Using aggregate
From: "Thomas Jagger" <tjagger@blarg.net>
Date: Sun, 24 Apr 2005 21:44:44 -0600
In-reply-to: <B998A44C8986644EA8029CFE6396A9241B31EC@exqld2-bne.qld.csiro.au>
Thread-index: AcVI9weuUimpYOKGTSuLeS+6T/fDyQAJL33AAAr+UgA=
I use aggregate to generate a data frame with one row for each code,
In the following example, I find the means for each group in each field x,
and y, that have the same code value.

>test<-data.frame(code=c(1,1,2,2,3,2,1,3),x=1:8,y=log(1:8))
> test
  code x         y 
1    1 1 0.0000000
2    1 2 0.6931472
3    2 3 1.0986123
4    2 4 1.3862944
5    3 5 1.6094379
6    2 6 1.7917595
7    1 7 1.9459101
8    3 8 2.0794415
> aggregate(test[,c("x","y"),],test$code,mean)
  test.code        x         y 
1         1 3.333333 0.8796858
2         2 4.333333 1.4255554
3         3 6.500000 1.8444397
> help(aggregate.data.frame)

For more information
You can also type aggregate.data.frame, to see the code Splus uses for this.

Tom Jagger

________________________________________
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of
Bill.Venables@csiro.au
Sent: Sunday, April 24, 2005 4:25 PM
To: andy_liaw@merck.com; mleeds@mlp.com; s-news@lists.biostat.wustl.edu
Subject: Re: [S] friend of mine's question

Another idea would be
 
coef(lm(cbind(price, volume, yield) ~ as.factor(code) - 1, mydat))
 
but this is limited to the special case of the mean.  Andy's idea is easier
to generalize.
 
-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Liaw, Andy
Sent: Monday, 25 April 2005 3:56 AM
To: 'Leeds, Mark'; s-news@lists.biostat.wustl.edu
Subject: Re: [S] friend of mine's question
One way or the other, the data need to be split into groups.  Here's one
possible way:
 
sapply(split(mydata, mydata$code), colMeans)
 
Andy
-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Leeds, Mark
Sent: Sunday, April 24, 2005 1:18 PM
To: s-news@lists.biostat.wustl.edu
Subject: [S] friend of mine's question
A friend of mine has a data set
With variables :
 
 code  price volume yield
 
where code is an integer and can take on
say 20 different values.
 
He wants the means of price, volume, yield
But within each code value.
 
Obviously, he can break it up by each code and then
take the means but he was hoping
there was a quicker
way to do this ? Thanks.
 
                                                mark 
----------------------------------------------------------------------------
--
Notice: This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New
Jersey, USA 08889), and/or its affiliates (which may be known outside the
United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
Banyu) that may be confidential, proprietary copyrighted and/or legally
privileged. It is intended solely for the use of the individual or entity
named on this message. If you are not the intended recipient, and have
received this message in error, please notify us immediately by reply e-mail
and then delete it from your system.
----------------------------------------------------------------------------
--


<Prev in Thread] Current Thread [Next in Thread>