s-news
[Top] [All Lists]

Re: Time management in S-Plus

To: "Michael Camilleri" <MichaelCamilleri@branz.co.nz>, "Herschtal Alan" <Alan.Herschtal@petermac.org>, <s-news@lists.biostat.wustl.edu>
Subject: Re: Time management in S-Plus
From: "Biggerstaff, Brad J. \(CDC/CCID/NCZVED\)" <bkb5@cdc.gov>
Date: Tue, 5 Jun 2007 08:30:07 -0600
References: <EBF85749A3E4944EBD55E1C4C49DB53B026A33D6@PMC-EMAIL.petermac.org.au> <A23B682083FD8248ADFE1317C82CBD670290328A@exchange.branznt.org.nz>
Thread-index: AcemRv2eeIz8MFGJS52a5HTnkySQxQApOEXgACQ0pdA=
Thread-topic: [S] Time management in S-Plus
As the S-Plus manual suggests regarding such slow-downs, I recently re-coded some for-loops using lapply, and double-loops can be done with nested lapplys.  There was dramatic improvement (hours to minutes), and printing of iterations (with, say, print(i)) showed the lack of slow-down with the lapply version.  An example is given in the manual, while another is
 

NTests <- vector(length=nSim, mode="list")

for(i in 1:nSim)

       NTests[[i]] <- vector(length=nSchemes, mode="list")

#

#OLD and SLOW
#
for(i in 1:nSim){

#      for(j in 1:nSchemes){

#             # Number of pools tested, either ID (1) or MP (PoolSize)

#             # and add the number of ID tests for positive MPs (re-test of MP-samples)

#             NTests[[i]][[j]] <- NSim[[i]] %/% ifelse(schemes[[i]][[j]],1,PoolSize) +

#                                               ifelse(schemes[[i]][[j]],0,(sum(PoolsSim[[i]][[j]]) * PoolSize))

#      }

#}

 

# much faster...

NTests <- lapply(1:nSim, function(i, pos, n, schemes, nschemes, pool.size, ntests){

                                  x <- ntests[[i]]

                                  x <- lapply(1:nschemes, function(j, i, sch, pos, n, pool.size){

                                                n[[i]] %/% (pool.size^(1-sch[[i]][[j]])) +

                                                       (sum(pos[[i]][[j]])*pool.size)^(1-sch[[i]][[j]])

                                          }, i=i, sch=schemes, pos=pos, n=n, pool.size=pool.size)

                                  x

            }, pos=PoolsSim, n=NSim, schemes=sch, nschemes=nSchemes, pool.size=PoolSize, ntests=NTests)

 

Passing some of the arguments (the ones sub-scripted) as lists is key, so re-formuating into lists may be needed if original objects do not inherit from list.
 
Brad

Brad Biggerstaff, Ph.D.
Mathematical Statistician
Division of Vector-Borne Infectious Diseases
National Center for Infectious Diseases
Centers for Disease Control and Prevention
P.O. Box 2087
Fort Collins, Colorado  80522-2087
(970) 221-6473 ...
BBiggerstaff@cdc.gov



From: s-news-owner@lists.biostat.wustl.edu [mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Michael Camilleri
Sent: Monday, June 04, 2007 3:10 PM
To: Herschtal Alan; s-news@lists.biostat.wustl.edu
Subject: Re: [S] Time management in S-Plus

S+ has a few other traps when you are dealing with large processing jobs.

 

One which I discovered is that S+ 7 for windows has a limit of 1,000 objects in the Restore Data Objects. If you are creating a new object in a chapter with each iteration then after 1,000 objects S+ will dramatically slow down and basically grind to a halt, no matter how small the objects are, and no matter how much RAM you have. Avoid this kind of assignment. If you need to create new objects with each iteration then create a list object with the required number of elements, and assign the objects to the list instead.

 

This is all to do with the automatic Restore Data Objects. If you create a brand new chapter with undo specifically turned off it won’t do it, but as soon as you re-open the chapter it starts restoring objects again.

 

Personally I don’t like the data restore feature. Early versions of S+ didn’t have it, and you just got used to saving scripts and backing up important objects (which is good analysis/programming practice). When S+ became a “Windows” application we got the undo and a pile of other useless features.

 

Michael

 

branz logo

 

MICHAEL CAMILLERI BSc, MSc, PhD

BUILDING PHYSICIST

T +64 4 237 1170

DDI +64 4 237 1174

PRIVATE BAG 50908

PORIRUA CITY 5240

WWW.BRANZ.CO.NZ

 


From: Herschtal Alan [mailto:Alan.Herschtal@petermac.org]
Sent: Monday, 4 June 2007 1:24 p.m.
To: s-news@lists.biostat.wustl.edu
Subject: Time management in S-Plus

 

 

I am trying to run many iterations of a simulation in S-plus, and am finding that the time taken per iteration increases almost exponentially. I have tried to write the code as efficiently as I can, with minimal use of for loops, vector arithmetic wherever possible, and reusing datasets where feasible. I understand that this phenomenon has something to do with the paging in S-plus's memory management system. Is there any way to keep the execution time constant ?

Thanks,

 

Alan Herschtal
Biostatistician
Centre for Biostatistics and Clinical Trials
Peter MacCallum Cancer Centre
Ph: 9656 3639

<Prev in Thread] Current Thread [Next in Thread>