s-news
[Top] [All Lists]

Horribly slow aggregate: alternatives?

To: s-news@wubios.wustl.edu
Subject: Horribly slow aggregate: alternatives?
From: Wim Kimmerer <kimmerer@sfsu.edu>
Date: Mon, 21 Jul 2003 12:40:38 -0700
Splusers (v. 61.1 windows 98 PIII with 256MB): 

I have a data frame with about 23000 records and 7 columns.  I want to get
sums of the last 2 columns by unique combinations of the first 5 columns,
which results in about 18000 records.

So... I used aggregate, and when I couldn't get any work done for several
hours because the computer was humming and grinding away at this problem, I
nuked Splus, exported the data, imported into Access (YUK) and ran a query
which took.... well I don't know but it was less than a second.

I looked at Hmisc for alternatives: there is a function called summarize,
but that only works for functions that return >1 value (as far as I know),
and will not run the function on more than one data value (you can give it
a matrix but for each combination of the grouping variables, it performs
the function on all of the values in all columns of the matrix
corresponding to those rows).  Thus, summarize is not suitable for what I
want to do, without needless trickery.

I realize that aggregate uses tapply which uses loops, but geez.... several
hours at least, compared to under one second?  

Is there an alternative that does not loop?

Thanks...Wim
======================
Dr. Wim Kimmerer
Romberg Tiburon Center
San Francisco State University
3152 Paradise Drive
Tiburon CA 94920
Ph. (415) 338-3515
Fax (415) 435-7120
http://online.sfsu.edu/~kimmerer/

<Prev in Thread] Current Thread [Next in Thread>