On Thu, 24 Feb 2000, Samantha Low Choy wrote:
>
> Hi All
>
> I was wondering whether anybody had tried running an analysis,
> such as a linear model using lm(),
> on 1,500,000 responses with,
> say 50-80 covariates?
Well, let's see. That's a data matrix (assuming continuous covariates) of
960Mb. And you will almost certainly need more than one copy, and S has a
design limit of 2Gb of objects (at least on a 32-bit machine, and 64-bit
machines are not currently supported, I think). So I guess no one has
*succeeded*, using lm.
There are other ways, though, to load in the data in blocks and build
up an answer. But seriously, you don't need anything like that much data
to build up an idea of the patterns on 80 covariates. Start with a 1%
sample, say.
> I guess some pre-processing to decrease number of covariates may bring
> them down to about 20-30.
> And some aggregation of the responses could reduce their number by
> factors of 16 or 100.
>
> But I thought I would see how far people had pushed Splus (or R) in
> terms of sample sizes,
> before killing our machine!
It is worth thinking more in terms of what has statistical value.
--
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news
|