Thomas Jagger wrote:
Good morning.
[snip]
However, lapply uses for loops, so you might as well use the for loops
explicitly (or try R).
Actually, while that is true in R, I believe it is not true in S-PLUS:
in S-PLUS lapply() calls the compiled function "S_qapply". I've noticed
considerable differences in memory usage between "for" loops and the
same computation done with lapply() (sometimes). lapply() seems to more
reliably reclaim memory used in expressions. The memory usage of "for"
loops can be improved by making the body of the for loop into a
function, but that's getting cumbersome.
The bottom line is: in S-PLUS, currently (in at least 6.X and 7.0 for
Windows) lapply() appears to do a better job with memory usage than
"for" loops (at least I've never seen a case to the contrary).
Here's an example; note how the memory usage increases with each
iteration of the "for" loop but stays stable for iterations of lapply().
> x <- matrix(rnorm(500^2), nrow=500, ncol=500)
> y <- x < 0
> z1 <- numeric(10)
> round(object.size(x)/2^20, 1)
[1] 1.9
> round(object.size(y)/2^20, 1)
[1] 1
> z1 <- numeric(10)
> mem.tally.reset()
> for (i in 1:10) {z1[i] <- sum(x + (1 - y));
cat(round(mem.tally.report()[2]/2^20, 1), "\n")}
5.7
6.7
7.6
8.6
9.5
10.5
11.5
12.4
13.4
14.3
> mem.tally.reset()
> z2 <- unlist(lapply(1:10, function(i, x, y) {z <- sum(x + (1 - y));
cat(round(mem.tally.report()[2]/2^20, 1), "\n"); z}, x, y))
5.7
5.7
5.7
5.7
5.7
5.7
5.7
5.7
5.7
5.7
>
Note that the occurence of this behavior is very dependent upon the
exact form of the expression involved. (And yes, it was a huge PITA
finding out exactly why a "for" loop with hundreds of lines of code was
steadily eating up memory -- I tracked it down to the expression above.)
If there's any generic way of preventing this kind of memory leakage in
"for" loops I'd be very interested to hear about it! (Insightful
support did not have any suggestions when I last asked them about it.)
For the morbidly curious, the following examples illustrate two things:
(1) the success of lapply() in reclaiming memory is not due to the fact
that the lapply includes an extra user-level function call; and (2)
making the body of the loop into a function allows the for loop to not
consume memory.
> z1 <- numeric(10)
> mem.tally.reset()
> for (i in 1:10) {z1[i] <- sum(i, x, 1 - y);
cat(round(mem.tally.report()[2]/2^20, 1), "\n")}
7.6
8.6
9.5
10.5
11.5
12.4
13.4
14.3
15.3
16.2
> mem.tally.reset()
> z2 <- unlist(lapply(1:10, function(i, x, y) {z <- sum(i, x, (1 - y));
cat(round(mem.tally.report()[2]/2^20, 1), "\n"); z}, x, y))
7.6
7.6
7.6
7.6
7.6
7.6
7.6
7.6
7.6
7.6
> mem.tally.reset()
> {z3 <- unlist(lapply(1:10, sum, x, 1 - y));
cat(round(mem.tally.report()[2]/2^20, 1), "\n")}
7.6
> z4 <- numeric(10)
> mem.tally.reset()
> f4 <- function(i, x, y) sum(i, x, 1 - y)
> for (i in 1:10) {z4[i] <- f4(i, x, y);
cat(round(mem.tally.report()[2]/2^20, 1), "\n")}
7.6
7.6
7.6
7.6
7.6
7.6
7.6
7.6
7.6
7.6
> c(all.equal(z1,z2), all.equal(z1,z3), all.equal(z1, z4))
[1] T T T
>
|