s-news
[Top] [All Lists]

Regression of a large data set.

To: <s-news@lists.biostat.wustl.edu>
Subject: Regression of a large data set.
From: John Thaden <jjthaden@flash.net>
Date: Tue, 25 Sep 2001 11:13:56 -0500
In-reply-to: <NCEHJMAIIKEFPIOEGGPHCELICDAA.bzajdlik@sentex.net>
References: <51F9C42DA15CD311BD220008C707D81903DC824C@usrymx10.merck.com>
I'm trying to fit 26046 observations with a linear model
        y = mu + A + G + AG + CG + error
where A, G and C are factors of 22, 1185, and 10 levels, respectively. Both lm() and glm() choke on this problem, giving
(after 35 minutes) the error message
Error in model.matrix.default(Terms, ..: Cannot create - data would have length greater than 268435455 (.Machine$integer.max/sizeof(numeric))
Dumped

I had also received error messages about object.size limitations, but these I could get past by setting options(object.size) to 1e10. Is there a similarly easy fix for the present error? I observe that the base-16 logarithm of 268435455 is 7. Is this a problem of single-precision math?

My tools at hand are S-Plus 4.5 for Windows on a P166 with 64 Mb ram. I also have R but don't know how to input the data (or do much of anything yet) with it. I may also have access to an old SAS version on a mainframe. Given these tools, is there a way I can do this regression?

I'll post a summary and my sincere thanks.

************************************************************
John J. Thaden, Ph.D., Research Assistant Professor
Department of Geriatrics                  (501) 257-5583
U. Arkansas for Medical Sciences    fax: (501) 257-4822
  mailing & shipping address:        jjthaden@flash.net
       Central Arkansas Veterans Healthcare System
       Research-LR151 (Room GB106)
       4300 West 7th Street
       Little Rock AR 72205 USA
************************************************************


<Prev in Thread] Current Thread [Next in Thread>