s-news
[Top] [All Lists]

Re: S-PLUS Vs some other softwares

To: Martin Maechler <maechler@stat.math.ethz.ch>
Subject: Re: S-PLUS Vs some other softwares
From: Frank E Harrell Jr <feh3k@spamcop.net>
Date: Wed, 3 Mar 2004 08:49:53 -0500
Cc: bert_gunter@merck.com, jadhavpr@vcu.edu, s-news@wubios.wustl.edu
In-reply-to: <16453.36324.430447.243818@gargle.gargle.HOWL>
Organization: Vanderbilt University
References: <4FEAC11847CF0C44B119A6DDD6251E9F324FE9@uswsmx20.merck.com> <16453.36324.430447.243818@gargle.gargle.HOWL>
On Wed, 3 Mar 2004 08:48:52 +0100
Martin Maechler <maechler@stat.math.ethz.ch> wrote:

> >>>>> "BertG" == Gunter, Bert <bert_gunter@merck.com>
> >>>>>     on Tue, 2 Mar 2004 08:57:01 -0500 writes:
> 
>     BertG> As Andy said, lmList will probably do what you want,
>     BertG> but as it uses lm(), it may also take a while. If you
>     BertG> wish to do it "by hand" yourself, try by() [which is
>     BertG> a wrapper for tapply()] and use lsfit instead of
>     BertG> lm. As others have said, lm may incur a lot of
>     BertG> overhead. The code would be something like
>  
>     .................
> 
> I think it should be lm.fit() , not lsfit().
> This will use the numerical engine below lm() without
> the overhead of formula -> model.frame -> model.matrix.
> 
> Martin Maechler <maechler@stat.math.ethz.ch>  http://stat.ethz.ch/~maechler/

A slightly faster approach may be to use the following function which is
part of the Hmisc library. The S-Plus version is shown below.  x is a
matrix, y a vector.

lm.fit.qr.bare <- function(x, y, 
                          tolerance = .Machine$single.eps, 
                          intercept=TRUE, xpxi=FALSE) {

  if(intercept) x <- cbind(1,x)
  if(storage.mode(x) != "double") storage.mode(x) <- "double"
  if(storage.mode(y) != "double") storage.mode(y) <- "double"
  dx <- dim(x)
  dn <- dimnames(x)
  qty <- y
  n <- dx[1]
  n1 <- 1:n
  p <- dx[2]
  p1 <- 1:p
  dy <- c(n, 1)
  z <- .Fortran("dqrls",
                         qr = x,
                         as.integer(dx),
                         pivot = as.integer(p1),
                         qraux = double(p),
                         y,
                         as.integer(dy),
                         coef = double(p),
                         residuals = y,
                         qt = qty,
                         tol = as.double(tolerance),
                         double(2 * p),
                         rank = as.integer(p))
  coef <- z$coef
  if(length(dn[[2]])) names(coef) <- dn[[2]]
  res <- z$residuals
  sse <- sum(res^2)
  sst <- sum((y-mean(y))^2)

  res <- list(coefficients=coef, residuals=res, 
              rsquared=1-sse/sst, fitted.values=y-res)
  if(xpxi) {
      R <- (z$qr)[p1, , drop = FALSE]
      R[lower.tri(R)] <- 0
      rinv <- solve(R, diag(length(coef)))
      xpxi <- rinv %*% t(rinv)
    res$xpxi <- xpxi
  }
  res
}

I use this for bootstrapping, etc.

Frank

<Prev in Thread] Current Thread [Next in Thread>