s-news
[Top] [All Lists]

Re: Horribly slow aggregate: alternatives?

To: Wim Kimmerer <kimmerer@sfsu.edu>
Subject: Re: Horribly slow aggregate: alternatives?
From: Frank E Harrell Jr <fharrell@virginia.edu>
Date: Mon, 21 Jul 2003 15:59:06 -0400
Cc: s-news@wubios.wustl.edu
In-reply-to: <3.0.6.32.20030721124038.008f2420@sfsu.edu>
Organization: University of Virginia
References: <3.0.6.32.20030721124038.008f2420@sfsu.edu>
On Mon, 21 Jul 2003 12:40:38 -0700
Wim Kimmerer <kimmerer@sfsu.edu> wrote:

> Splusers (v. 61.1 windows 98 PIII with 256MB): 
> 
> I have a data frame with about 23000 records and 7 columns.  I want to get
> sums of the last 2 columns by unique combinations of the first 5 columns,
> which results in about 18000 records.
> 
> So... I used aggregate, and when I couldn't get any work done for several
> hours because the computer was humming and grinding away at this problem, I
> nuked Splus, exported the data, imported into Access (YUK) and ran a query
> which took.... well I don't know but it was less than a second.
> 
> I looked at Hmisc for alternatives: there is a function called summarize,
> but that only works for functions that return >1 value (as far as I know),
> and will not run the function on more than one data value (you can give it
> a matrix but for each combination of the grouping variables, it performs
> the function on all of the values in all columns of the matrix
> corresponding to those rows).  Thus, summarize is not suitable for what I
> want to do, without needless trickery.

Wim - This posting to s-news was premature as I just finished sending you a 
private reply to the private message you also sent.  summarize works fine with 
only one statistic, but you may need to update your version of summarize and a 
function it calls, mApply.  See the note I just sent.  -FH

> 
> I realize that aggregate uses tapply which uses loops, but geez.... several
> hours at least, compared to under one second?  
> 
> Is there an alternative that does not loop?
> 
> Thanks...Wim
> ======================
> Dr. Wim Kimmerer
> Romberg Tiburon Center
> San Francisco State University
> 3152 Paradise Drive
> Tiburon CA 94920
> Ph. (415) 338-3515
> Fax (415) 435-7120
> http://online.sfsu.edu/~kimmerer/
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news


---
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

<Prev in Thread] Current Thread [Next in Thread>