s-news
[Top] [All Lists]

Re: Horribly slow aggregate: alternatives?

To: "'Wim Kimmerer'" <kimmerer@sfsu.edu>, s-news@wubios.wustl.edu
Subject: Re: Horribly slow aggregate: alternatives?
From: "Pikounis, Bill" <v_bill_pikounis@merck.com>
Date: Mon, 21 Jul 2003 16:21:14 -0400
Wim,
You mention you are using a data frame:  it may be that your row.names
vector is slowing things down because it may check that there are no
duplicates.  I do not have access to S-PLUS 6.1 for Windows Help at the
moment, but look at the ?data.frame help page for an argument called
"dup.row.names" or "dup.names.ok" (my memory fails me) that will allow
duplicate row.names, and make sure that it is set to TRUE.

Hope that helps.

Bill

----------------------------------------
Bill Pikounis, Ph.D.

Biometrics Research Department
Merck Research Laboratories
PO Box 2000, MailDrop RY33-300  
126 E. Lincoln Avenue
Rahway, New Jersey 07065-0900
USA

v_bill_pikounis@merck.com

Phone: 732 594 3913
Fax: 732 594 1565


> -----Original Message-----
> From: Wim Kimmerer [mailto:kimmerer@sfsu.edu] 
> Sent: Monday, July 21, 2003 3:41 PM
> To: s-news@wubios.wustl.edu
> Subject: [S] Horribly slow aggregate: alternatives?
> 
> 
> Splusers (v. 61.1 windows 98 PIII with 256MB): 
> 
> I have a data frame with about 23000 records and 7 columns.  
> I want to get
> sums of the last 2 columns by unique combinations of the 
> first 5 columns,
> which results in about 18000 records.
> 
> So... I used aggregate, and when I couldn't get any work done 
> for several
> hours because the computer was humming and grinding away at 
> this problem, I
> nuked Splus, exported the data, imported into Access (YUK) 
> and ran a query
> which took.... well I don't know but it was less than a second.
> 
> I looked at Hmisc for alternatives: there is a function 
> called summarize,
> but that only works for functions that return >1 value (as 
> far as I know),
> and will not run the function on more than one data value 
> (you can give it
> a matrix but for each combination of the grouping variables, 
> it performs
> the function on all of the values in all columns of the matrix
> corresponding to those rows).  Thus, summarize is not 
> suitable for what I
> want to do, without needless trickery.
> 
> I realize that aggregate uses tapply which uses loops, but 
> geez.... several
> hours at least, compared to under one second?  
> 
> Is there an alternative that does not loop?
> 
> Thanks...Wim
> ======================
> Dr. Wim Kimmerer
> Romberg Tiburon Center
> San Francisco State University
> 3152 Paradise Drive
> Tiburon CA 94920
> Ph. (415) 338-3515
> Fax (415) 435-7120
> http://online.sfsu.edu/~kimmerer/
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
> 

------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains 
information of Merck & Co., Inc. (Whitehouse Station, New Jersey, 
USA) that may be confidential, proprietary copyrighted and/or legally 
privileged, and is intended solely for the use of the individual or entity
named on this message. If you are not the intended recipient, and
have received this message in error, please immediately return this by 
e-mail and then delete it.
------------------------------------------------------------------------------

<Prev in Thread] Current Thread [Next in Thread>