s-news
[Top] [All Lists]

Re: [S] Isoreg Revision

To: s-news@wubios.wustl.edu
Subject: Re: [S] Isoreg Revision
From: Richard Raubertas <rfr@bst.rochester.edu>
Date: Sun, 27 Feb 2000 15:04:59 -0500 (EST)
Cc: rfr@bio2.bst.rochester.edu
Reply-to: Richard Raubertas <rfr@bst.rochester.edu>
Sender: owner-s-news@wubios.wustl.edu

Tim Johnson and Bill Venables have posted an isotonic regression
routine that uses nnls.fit().  An alternative approach is the
pool-adjacent-violators algorithm.  An implementation is

pava <- function (x, wt=rep(1,length(x)))
#  Compute the isotonic regression of numeric vector 'x', with 
#  weights 'wt', with respect to simple order.  The pool-adjacent-
#  violators algorithm is used.  Returns a vector of the same length
#  as 'x' containing the regression.

#  02 Sep 1994 / R.F. Raubertas
{
   n <- length(x)
   if (n <= 1) return (x)
   if (any(is.na(x)) || any(is.na(wt))) {
      stop ("Missing values in 'x' or 'wt' not allowed")
   }
   lvlsets <- (1:n)
   repeat {
      viol <- (as.vector(diff(x)) < 0)  # Find adjacent violators
      if (!(any(viol))) break

      i <- min( (1:(n-1))[viol])        # Pool first pair of violators
      lvl1 <- lvlsets[i]
      lvl2 <- lvlsets[i+1]
      ilvl <- (lvlsets == lvl1 | lvlsets == lvl2)
      x[ilvl] <- sum(x[ilvl]*wt[ilvl]) / sum(wt[ilvl])
      lvlsets[ilvl] <- lvl1
   }
   x
}

(This does not have the functionality of Johnson's and Venables' 
'untie' and 'subset' arguments, but they could certainly be added.)

A disadvantage of isoreg() compared to pava() is that it creates
an n x (n+1) design matrix to pass to nnls.fit(), which could be
a problem for large n.
I did some timing tests using S-Plus 3.4 on a Sun Ultra 1/140 
running Solaris 2.6:

> y <- runif(100)
> ys <- sort(y)
> unix.time(temp<-pava(y))
[1] 0.4400024 0.0000000 0.0000000 0.0000000 0.0000000
> unix.time(temp2<-isoreg(1:100,y,untie="n"))
[1] 0.1600037 0.0000000 0.0000000 0.0000000 0.0000000
> unix.time(temp<-pava(ys))
[1] 0 0 0 0 0
> unix.time(temp2<-isoreg(1:100,ys,untie="n"))
[1] 1.280014 0.000000 2.000000 0.000000 0.000000
> unix.time(temp<-pava(rev(ys)))
[1] 0.4900055 0.0000000 0.0000000 0.0000000 0.0000000
> unix.time(temp2<-isoreg(1:100,rev(ys),untie="n"))
[1] 0.1500092 0.0000000 0.0000000 0.0000000 0.0000000

isoreg() is faster for random or anti-isotonic data, but slows down
substantially when the data are already isotonic (and presumably
when they are close to isotonic).  pava() gets faster when the
data are close to isotonic, because less pooling is needed.

Rich Raubertas
rfr@bst.rochester.edu

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>