As others have said, what apply has to do in this case is loop over the
900,000 cases and do a 'sum' over three elements each time. In this
case the overhead of calling an S+ function totally swamps the numeric
operations.
Doing this on smaller datasets (300x30x3) on my machine (2CPU, 3GHz Xeon
running Windows 2000 and S-Plus 6.1) shows an overhead of about 140
microseconds per call to sum, so I would expect it to take
100*1e-6*9e5=90 seconds.
The thing is, it is worse than this. If I do a case with 900x90x3 it
takes 300 usec per 'sum'.
R is fairly stable at just under 15usec per 'sum' on my machine.
In many cases when benchmarking ever larger arrays slowdowns in
per-element times are due to cache effects, but the numbers here seem so
much larger than any conceivable memory bandwidth times that I don't
think that is what it is. It seems most likely to be a memory
management effect -- perhaps S is allocating and deallocating a bunch of
things it doesn't have to per function invocation?
In doing some systematic tests where it runs through different sizes
repeatedly, I'm getting some strange hysteresis effects in the timings,
which would make my hypothesize that the issue is memory management, but
I'm not just sure what I would do if I was trying soak up that much time
per invocation.
-Steve Karmesin
David L Lorenz wrote:
Hi,
I ran into an interesting question from one of our users. He had an array
of about 3000 by 300 by 3. He tried to use apply to sum the last dimension:
result <- apply(array, c(1,2), sum)
I'm not sure he was ever able to get the result. He was surprised
because he could use apply over different dimensions and had no problem:
wrong.result <- apply(array, c(2,3), sum)
I suggested that he simply break down the problem into a simple
summation:
result <- array[,,1] + array[,,2] + array[,,3]
That executed very fast.
My question is "Has anybody constructed a list of functions that do not
scale well under certain circumstances?" I remember seeing something
within the last year about outer being very slow for long vectors and
clearly, there are some problems with apply.
Thanks.
Dave
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
|