s-news
[Top] [All Lists]

Re: scalability

To: "'David L Lorenz'" <lorenz@usgs.gov>, s-news@lists.biostat.wustl.edu
Subject: Re: scalability
From: "Gunter, Bert" <bert_gunter@merck.com>
Date: Fri, 26 Mar 2004 16:26:48 -0500
Get V&R's S-PROGRAMMING book. They discuss these issues in detail (as does
S-Plus's documentation).

Basically, anything that loops in S rather than in the underlying C or
Fortran code will not scale well; nor does recursion for similar reasons
(high overhead in creating evaluation frames for each recursion level). All
apply() functions are thinly disguised S loops. apply over c(1,2) requires
900,000 calls to sum, which then calls C or Fortran to do the summing over 3
elements. apply over c(2,3) requires 900 calls to sum, which then 
calls the Fortran to sum over 3000 things, which happens very fast. 

So your colleague should should not have been surprised.

Cheers,
Bert Gunter
Biometrics Research RY 33-300
Merck & Company
P.O. Box 2000
Rahway, NJ 07065-0900
Phone: (732) 594-7765
mailto: bert_gunter@merck.com

"The business of the statistician is to catalyze the scientific learning
process."      -- George E.P. Box


-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of David L Lorenz
Sent: Friday, March 26, 2004 3:23 PM
To: s-news@lists.biostat.wustl.edu
Subject: [S] scalability


Hi,
  I ran into an interesting question from one of our users. He had an array
of about 3000 by 300 by 3. He tried to use apply to sum the last dimension:

result <- apply(array, c(1,2), sum)

  I'm not sure he was ever able to get the result.  He was surprised
because he could use apply over different dimensions and had no problem:

wrong.result <- apply(array, c(2,3), sum)

  I suggested that he simply break down the problem into a simple
summation:

result <- array[,,1] + array[,,2] + array[,,3]

  That executed very fast.

  My question is "Has anybody constructed a list of functions that do not
scale well under certain circumstances?"  I remember seeing something
within the last year about outer being very slow for long vectors and
clearly, there are some problems with apply.
  Thanks.
Dave


--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news



------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments, contains 
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New 
Jersey, USA 08889), and/or its affiliates (which may be known outside the 
United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as 
Banyu) that may be confidential, proprietary copyrighted and/or legally 
privileged. It is intended solely for the use of the individual or entity named 
on this message.  If you are not the intended recipient, and have received this 
message in error, please notify us immediately by reply e-mail and then delete 
it from your system.
------------------------------------------------------------------------------

<Prev in Thread] Current Thread [Next in Thread>