Steve Karmesin wrote:
As others have said, what apply has to do in this case is loop over
the 900,000 cases and do a 'sum' over three elements each time. In
this case the overhead of calling an S+ function totally swamps the
numeric operations.
A little more investigation (together with office mate Tony Plate)
provides some insight.
Using mem.tally.reset() and mem.tally.report() shows that for this case
it is allocating a whopping 1280 bytes for each call to 'sum'.
Just touching that much memory is going to be slow. So why would it do
that? Looking at the definition of the apply function shows that it is
allocating a general list for the result, not a vector-based array or
matrix.
Why? It has a shortcut that lets it use efficient matrices if the input
is a 2D matrix, but this one is 3D, so it uses the general code, which
is much, much slower and uses a lot more memory.
If you collapse the first two dimensions of the array the times are
stable at <80usec per call to sum and it allocates 8 bytes per call,
which is just the amount of space needed.
Still, the R code seems to always build a list, and it is about 15usec
per call. Somehow the underlying function call and perhaps list storage
mechanisms are more efficient there.
For comparison, using rowSums has a the same 8 bytes per call required
to store the result and about 0.1 usec per call, since the whole
evaluation is then in C and the S-plus function 'sum' is never actually
called.
-Steve Karmesin
|