s-news
[Top] [All Lists]

Re: Applying vector functions to dataframes

To: Thomas Jagger <tjagger@blarg.net>
Subject: Re: Applying vector functions to dataframes
From: Tony Plate <tplate@blackmesacapital.com>
Date: Thu, 28 Jul 2005 11:00:34 -0600
Cc: 'Eric Turkheimer' <ent3c@virginia.edu>, s-news@lists.biostat.wustl.edu
In-reply-to: <20050726164736.9B954F398C@mail.blarg.net>
References: <20050726164736.9B954F398C@mail.blarg.net>
User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)
Thomas Jagger wrote:
Good morning.
[snip]

However, lapply uses for loops, so you might as well use the for loops
explicitly (or try R).


Actually, while that is true in R, I believe it is not true in S-PLUS: in S-PLUS lapply() calls the compiled function "S_qapply". I've noticed considerable differences in memory usage between "for" loops and the same computation done with lapply() (sometimes). lapply() seems to more reliably reclaim memory used in expressions. The memory usage of "for" loops can be improved by making the body of the for loop into a function, but that's getting cumbersome.

The bottom line is: in S-PLUS, currently (in at least 6.X and 7.0 for Windows) lapply() appears to do a better job with memory usage than "for" loops (at least I've never seen a case to the contrary).

Here's an example; note how the memory usage increases with each iteration of the "for" loop but stays stable for iterations of lapply().

> x <- matrix(rnorm(500^2), nrow=500, ncol=500)
> y <- x < 0
> z1 <- numeric(10)
> round(object.size(x)/2^20, 1)
[1] 1.9
> round(object.size(y)/2^20, 1)
[1] 1
> z1 <- numeric(10)
> mem.tally.reset()
> for (i in 1:10) {z1[i] <- sum(x + (1 - y)); cat(round(mem.tally.report()[2]/2^20, 1), "\n")}
5.7
6.7
7.6
8.6
9.5
10.5
11.5
12.4
13.4
14.3
> mem.tally.reset()
> z2 <- unlist(lapply(1:10, function(i, x, y) {z <- sum(x + (1 - y)); cat(round(mem.tally.report()[2]/2^20, 1), "\n"); z}, x, y))
5.7
5.7
5.7
5.7
5.7
5.7
5.7
5.7
5.7
5.7
>

Note that the occurence of this behavior is very dependent upon the exact form of the expression involved. (And yes, it was a huge PITA finding out exactly why a "for" loop with hundreds of lines of code was steadily eating up memory -- I tracked it down to the expression above.)

If there's any generic way of preventing this kind of memory leakage in "for" loops I'd be very interested to hear about it! (Insightful support did not have any suggestions when I last asked them about it.)

For the morbidly curious, the following examples illustrate two things: (1) the success of lapply() in reclaiming memory is not due to the fact that the lapply includes an extra user-level function call; and (2) making the body of the loop into a function allows the for loop to not consume memory.

> z1 <- numeric(10)
> mem.tally.reset()
> for (i in 1:10) {z1[i] <- sum(i, x, 1 - y); cat(round(mem.tally.report()[2]/2^20, 1), "\n")}
7.6
8.6
9.5
10.5
11.5
12.4
13.4
14.3
15.3
16.2
> mem.tally.reset()
> z2 <- unlist(lapply(1:10, function(i, x, y) {z <- sum(i, x, (1 - y)); cat(round(mem.tally.report()[2]/2^20, 1), "\n"); z}, x, y))
7.6
7.6
7.6
7.6
7.6
7.6
7.6
7.6
7.6
7.6
> mem.tally.reset()
> {z3 <- unlist(lapply(1:10, sum, x, 1 - y)); cat(round(mem.tally.report()[2]/2^20, 1), "\n")}
7.6
> z4 <- numeric(10)
> mem.tally.reset()
> f4 <- function(i, x, y) sum(i, x, 1 - y)
> for (i in 1:10) {z4[i] <- f4(i, x, y); cat(round(mem.tally.report()[2]/2^20, 1), "\n")}
7.6
7.6
7.6
7.6
7.6
7.6
7.6
7.6
7.6
7.6
> c(all.equal(z1,z2), all.equal(z1,z3), all.equal(z1, z4))
[1] T T T
>

<Prev in Thread] Current Thread [Next in Thread>