s-news
[Top] [All Lists]

Summary: Regression of a large data set

To: <s-news@lists.biostat.wustl.edu>
Subject: Summary: Regression of a large data set
From: John Thaden <jjthaden@flash.net>
Date: Tue, 25 Sep 2001 16:59:30 -0500
Sincerest thanks to Charles Berry, David Smith and Douglas Bates for pointing out that my model is severely overdetermined, by some 11,000 degrees of freedom, and that a design matrix for it would occupy over a gigabyte of storage! Thanks also to Peter Sherer for steering me to "S Programming" by Venables & Ripley which he says has two function to handle regression of larger data sets.

Douglas asks if I really want to fit a fixed-effects model involving a factor with over 1000 levels (my G factor). I think the answer is 'yes'. That factor represents individual spots on an array, each itself representing a gene. I'm basing my analysis on that of Kerr & Churchill (2001) PNAS 98:8961 (though the overdetermined model was entirely doing!). Regression is their first step toward a cluster analysis that includes bootstrapping to assess reliability.

-John Thaden

### My Original Message #####
I'm trying to fit 26046 observations with a linear model
        y = mu + A + G + AG + CG + error
where A, G and C are factors of 22, 1185, and 10 levels, respectively. Both lm() and glm() choke on this problem ...

************************************************************
John J. Thaden, Ph.D., Research Assistant Professor
Department of Geriatrics                  (501) 257-5583
U. Arkansas for Medical Sciences    fax: (501) 257-4822
************************************************************


<Prev in Thread] Current Thread [Next in Thread>
  • Summary: Regression of a large data set, John Thaden <=