This is a long-overdue summary of the responses (maybe a year ago) to
a question I had posted.
The problem was to sum without explicit looping the diagonals of a
matrix in the southwest to northeast direction. For example, for the
matrix M =
a b c d
b c d e
c d e f
d e f g
I would want the vector
V = (a 2b 3c 4d 3e 2f g).
Answers:
The tapply family of functions gives a charmingly simple solution. There
were two basic solutions: Dave Krantz's and everybody else's. I will give
Krantz's later. Everyone else used tapply:
V <- tapply(M, sum(M) + col(M), sum).
The matrix does not have to be square.
This response was given by Alan Zaslavsky, Tony Plate, Terry
Therneau, Richard Becker, Sam Buttrey, Douglas Bates, Phil Spector,
pburns, Christian Keller, and Franz-Josef Mueter; I may have missed
some. The first two responses, by the way, came within a few minutes
of each other -- and within a few minutes of when I had posted the
question -- from the U.S. East coast (at 8:15 a.m.) and from New
Zealand (at 1:24 a.m. -- the next day!).
Bill Venables pointed that wrapping as.vector( ) will remove the names
attribute from the result:
V <- as.vector(tapply(M, row(M) + col(M), sum)).
Venables added, concerning tapply:
> This is pretty extravagant if M is a large matrix, though, since at
> some stage you are working with three objects (at least) of the same .
> size as M.
>
> No pure S solution seems particularly good to me if the matrix is
> really large. If it were an operation I had to do often for very large
> matrices, I would consider going to C and dynamic loading.
Rolf Turner supplied an earlier posting from Doug Bates for summing the
diagonals in the other direction:
V <- tapply(M, row(M) - col(M), sum).
Dave Krantz supplied the following:
> One way to solve your problem is
sapply(split(M, row(M)+col(M)), sum).
> I think that's pretty efficient, because it implicitly calls a version
> of lapply() that has become quite efficient. Years ago, I would have
> used the split() function to create a list by diagonals, as above, then
> would have struggled to avoid implicit looping at the exec level
> by turning the list into a matrix (by adding appropriate zeros), and
> then would have summed via matrix multiplication with a vector of 1's;
> but I believe such maneuvers might no longer be needed, since sapply()
> works well enough even on long lists. (Others will know better
> than I.)
My memory is that a quick test on a large matrix showed Krantz's
version to be an improvement on tapply, 2 to 3 times faster at the
300x300 or 500x500 level. I had meant to test the performance more
systematically, but never had the time.
Thanks to all--
David Pattison
David.H,Pattison@ssa.gov
Social Security Administration
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news
|