s-news
[Top] [All Lists]

Re: S-PLUS Vs some other softwares

To: 'Pravin' <jadhavpr@vcu.edu>, s-news@wubios.wustl.edu
Subject: Re: S-PLUS Vs some other softwares
From: "Gunter, Bert" <bert_gunter@merck.com>
Date: Tue, 2 Mar 2004 08:57:01 -0500
As Andy said, lmList will probably do what you want, but as it uses lm(), it may also take a while. If you wish to do it "by hand" yourself, try by() [which is a wrapper for tapply()] and use lsfit instead of lm. As others have said, lm may incur a lot of overhead. The code would be something like
 
results<- by(the.data.frame, the.data.frame$patientID,function(z)coef(lsfit(z$x,z$y))[2] )
 
where x and y are the x and y values for each patient (note they are in the reverse order of lm() syntax, which would be lm(y~x)).
 
results is an object of class "by" -- essentially a list of slopes of length the number of different patient id's + some additional attribute info. You can probably ignore this.
 
Let us know how this all works out for you. Specifically, did any of the suggestions give you an answer in a "reasonable" amount of time, where you define "reasonable." Large data sets **CAN** be a problem, but 500,000 x8 with little covariate info doesn't sound all that large really, especially these days.
 
Cheers,

Bert Gunter
Biometrics Research RY 33-300
Merck & Company
P.O. Box 2000
Rahway, NJ 07065-0900
Phone: (732) 594-7765
mailto: bert_gunter@merck.com

"The business of the statistician is to catalyze the scientific learning process."      -- George E.P. Box

 
 
 
-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu [mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Pravin
Sent: Monday, March 01, 2004 7:39 PM
To: s-news@wubios.wustl.edu
Subject: [S] S-PLUS Vs some other softwares

Hi all,

(Almost)Always I have written S-PLUS code where for() loop looked indispensable to ME. Since it did my job at the expense of slightly more dos time, I never looked at the alternatives. But, this time I have a very simple problem and I thought for() loop should be able to do the job. But it didn't!

I am doing one permutation experiment that requires me to analyze data from 500,000 patients(9 samples per patient) and all I want to fit is the linear regression model and extract the estimates of slope on each patient. After running my computer for 16 hrs (CPU usage looked like it was computing all the time), S-PLUS reached patient number 19,000......Is there any quicker way of doing this in S-PLUS? Or from what I always hear---- S-PLUS is limited by its ability to handle huge datasets at hand, do I have to look for some other software that can do this huge computational task really quickly? Any recommendations?

LOOP:
Nsub<-500,000
for (i in 1:nsub)  
        { od.fit<-lm(data.y~data.x,data="">
        slope[i,]<-coe(od.fit)[2}}
               
Thanks much,

Pravin

Pravin Jadhav


------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.
------------------------------------------------------------------------------

<Prev in Thread] Current Thread [Next in Thread>