s-news
[Top] [All Lists]

Re: Question about aggregate() and statistics with greater than

To: "Li, Mike" <LiMIK@cder.fda.gov>
Subject: Re: Question about aggregate() and statistics with greater than
From: Sundar Dorai-Raj <sundar.dorai-raj@PDF.COM>
Date: Fri, 21 May 2004 12:04:07 -0700
Cc: s-news@lists.biostat.wustl.edu
In-reply-to: <4C88DC099E9AF945A6DA4D6FFA1865D1027927F3@cdsx06.cder.fda.gov>
Organization: PDF Solutions, Inc.
References: <4C88DC099E9AF945A6DA4D6FFA1865D1027927F3@cdsx06.cder.fda.gov>
Reply-to: sundar.dorai-raj@PDF.COM
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)


Li, Mike wrote:

Can you suggest a way to implement the aggregate function which allows it to return more than a single value for each subset of the data?

i.e. Suppose you have the following dataset:
dat _ data.frame(ID=c(rep(1,4),rep(2,4),rep(3,4)), TIME=rep(c(0,0,1,1), 3))

The aggregate function accepts arguments to "fun" that return single statistics, such as mean.
e.g. The following code is successful:
aggregate(dat$TIME, list(dat$ID),mean)

However, I want to have the aggregate function perform functions on the subsetted data that return several values, such as by using the "unique" function.

i.e. The following code fails:
aggregate(dat$TIME, list(dat$ID),unique)

Can you suggest an efficient way to perform operations on subsets of data?
Thanks much,
Mike Li


Take a look at ?by:

by(dat$TIME, list(dat$ID), unique)

--sundar

P.S. "_" is deprecated as assignment as of S-PLUS 6.0 and should not be used.



<Prev in Thread] Current Thread [Next in Thread>