To anyone who can stand one more:
I replied privately to the original poster, but there have been so many
suboptimal/complicated solutions presented, that I feel I need to post it
publicly.
First, any solution with "apply" in it involved implicit looping, although
it may succeed in transferring the looping to C code enough that performance
penalties are not important. Still, I feel that it violates the spirit of
the query. Moreover, the following clever trick, which is due to Bill
Venables (a long time ago) will run very fast --because it **is** all C
code, effectively -- and is very simple. It also illustrates a wise maxim
from Brian Ripley: the biggest factor influencing execution speed is usually
programming skill. The tradeoff here is storage space for time.
Anyway, the idea is simple: For running medians of length k, stack up k
suitably lagged vectors and then create an indexing column that indexes the
first element of each of the vectors with a 1, the second with a 2, and so
forth. Hence the running medians are the medians of those elements of the
stacked vectors with the same index value. Then simply use order() to sort
the two vectors, first on the index column, which ranks all the values of
each group together, and then by the data vector, which will order the
values within each group to break the "ties." Then a simple indexing
operation will extract the medians.
Here's the code for running medians of length 3 -- I've lopped off end
values.
## z the data in sequential order
## no missings (though this isn't essential)
n<-length(z)
z.stack<- c(z[(1:(n-2)],z[2:(n-1)],z[3:n])
indx<-rep(1:(n-2),3)
z.ord<-z.stack(order(indx,z.stack))
runmeds<-z.ord[seq(2,3*(n-3),by=3)
C'est ca!
Cheers to all,
Bert Gunter
Biometrics Research RY 84-16
Merck & Company
P.O. Box 2000
Rahway, NJ 07065-0900
Phone: (732) 594-7765
mailto: bert_gunter@merck.com
"The business of the statistician is to catalyze the scientific learning
process." -- George E.P. Box
-----Original Message-----
From: Warnes, Gregory R [mailto:gregory_r_warnes@groton.pfizer.com]
Sent: Tuesday, January 21, 2003 11:56 AM
To: 'brahm@alum.mit.edu'; s-news@lists.biostat.wustl.edu
Cc: Markus.Loecher@scr.siemens.com
Subject: Re: [S] moving median in S without looping
And to complete the bevy of options, here is my code (part of the R gregmisc
package) for computing running values of arbitrary functions. It also works
fine in S-Plus.
-Greg
--- start of code ---
"running" <- function( X, fun=mean, width=min(length(X),20),
allow.fewer=FALSE,...)
running2(X=X, fun=fun, width=width, allow.fewer=allow.fewer, ...)
"running2" <- function( X, Y=NULL, fun=mean, width=min(length(X),20),
allow.fewer=FALSE,...)
{
n <- length(X)
from <- sapply( (1:n) - width + 1, function(x) max(x,1) )
to <- 1:n
elements <- apply(cbind(from,to), 1,function(x) seq(x[1], x[2]) )
if(is.matrix(elements))
elements <- as.data.frame(elements)
if(is.null(Y)) # univariate
{
funct <- function(which,what,fun,...) fun(what[which],...)
Xvar <- sapply(elements, funct, what=X, fun=fun, ...)
}
else # bivariate
{
funct <- function(which,XX,YY,fun,...) fun(XX[which],YY[which], ...)
Xvar <- sapply(elements, funct, XX=X, YY=Y, fun=fun, ...)
}
names(Xvar) <- paste(from,to,sep=":")
if(!allow.fewer)
Xvar[1:(width-1)] <- NA
return(Xvar)
}
--- end of code ---
--- start of help documentation ---
Apply a Function Over Adjacent Subsets of a Vector
Description:
Applies a function over subsets of the vector(s) formed by taking
a fixed number of previous points.
Usage:
running(X, fun=mean, width=min(length(X),20), allow.fewer=FALSE,...)
running2(X,Y, fun=mean, width=min(length(X),20), allow.fewer=FALSE,...)
Arguments:
X: data vector
Y: data vector
fun: function to apply. Default is `mean'
width: integer giving the number of vector elements to include in
the subsets. Defaults to the lesser of the length of the
data and 20 elements.
allow.fewer: Boolean indicating whether the function should be computed
for initial subsets with fewer than `width' points
...: parameters to be passed to `fun'
Details:
`running' applies the specified univariate function to sequential
windows on `X'.
`running2' applies the specified bivariate function to sequential
windows of `X' and `Y'.
Value:
Vector containg the results of applying the function `fun' to the
subsets of `X' (`running') or `X' and `Y' (running2).
Author(s):
Gregory R. Warnes Gregory_R_Warnes@groton.pfizer.com
Examples:
running(1:20,width=5)
plot(1:20, running(1:20,width=5))
plot(1:20, running(1:20,width=5, allow.fewer=TRUE))
# plot running mean and central 2 standard deviation range
# estimated by last 40 observations
dat <- rnorm(500, sd=1 + (1:500)/500 )
plot(dat)
fun <- function(x,sign=1) mean(x) + sign * sqrt(var(x))
lines(running(dat,width=50,fun=mean,allow=TRUE),col="blue")
lines(running(dat,width=50,fun=fun, sign=-1, allow=TRUE),col="red")
lines(running(dat,width=50,fun=fun, sign=1, allow=TRUE),col="red")
# plot running running correlation estimated by last 40 observations
# along with true (local) correlation
X <- rnorm(500, sd=1)
Y <- X + rnorm(500, sd=(1:500)/500)
rho <- 1/ ( 1 * sqrt(1 + (1:500)/500) ) # true
plot(running2(X,Y,width=20,fun=cor),col="red",type="s")
lines(rho,type="l",col="blue")
--- end of documentation ---
LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be
privileged. It is intended for the addressee(s) only. Access to this E-mail
by anyone else is unauthorized. If you are not an addressee, any disclosure
or copying of the contents of this E-mail or any action taken (or not taken)
in reliance on it is unauthorized and may be unlawful. If you are not an
addressee, please inform the sender immediately.
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may
be confidential, proprietary copyrighted and/or legally privileged, and is
intended solely for the use of the individual or entity named on this message.
If you are not the intended recipient, and have received this message in error,
please immediately return this by e-mail and then delete it.
==============================================================================
|