|
As the S-Plus manual suggests regarding such slow-downs, I
recently re-coded some for-loops using lapply, and double-loops can be done with
nested lapplys. There was dramatic improvement (hours to minutes), and
printing of iterations (with, say, print(i)) showed the lack of slow-down with
the lapply version. An example is given in the manual, while another is
NTests <- vector(length=nSim, mode="list")
for(i in 1:nSim)
NTests[[i]] <- vector(length=nSchemes, mode="list")
#
#OLD and SLOW #for(i in 1:nSim){
# for(j in
1:nSchemes){
#
# Number of pools tested, either ID (1) or MP (PoolSize)
#
# and add the number of ID tests for positive MPs (re-test of
MP-samples)
#
NTests[[i]][[j]] <- NSim[[i]] %/% ifelse(schemes[[i]][[j]],1,PoolSize)
+
#
ifelse(schemes[[i]][[j]],0,(sum(PoolsSim[[i]][[j]]) *
PoolSize))
# }
#}
# much
faster...
NTests <- lapply(1:nSim, function(i, pos, n, schemes, nschemes, pool.size, ntests){
x <- ntests[[i]]
x <- lapply(1:nschemes, function(j, i, sch, pos, n, pool.size){
n[[i]] %/% (pool.size^(1-sch[[i]][[j]])) +
(sum(pos[[i]][[j]])*pool.size)^(1-sch[[i]][[j]])
}, i=i, sch=schemes, pos=pos, n=n, pool.size=pool.size)
x
}, pos=PoolsSim, n=NSim, schemes=sch, nschemes=nSchemes, pool.size=PoolSize, ntests=NTests)
Passing some of the arguments (the ones sub-scripted)
as lists is key, so re-formuating into lists may be needed if original objects
do not inherit from list.
Brad
Brad Biggerstaff, Ph.D.
Mathematical Statistician Division of Vector-Borne Infectious Diseases
National Center for Infectious Diseases Centers for Disease Control and
Prevention P.O. Box 2087 Fort Collins, Colorado 80522-2087
(970) 221-6473 ... BBiggerstaff@cdc.gov
S+ has a few other
traps when you are dealing with large processing
jobs.
One which I discovered
is that S+ 7 for windows has a limit of 1,000 objects in the Restore Data
Objects. If you are creating a new object in a chapter with each iteration then
after 1,000 objects S+ will dramatically slow down and basically grind to a
halt, no matter how small the objects are, and no matter how much RAM you have.
Avoid this kind of assignment. If you need to create new objects with each
iteration then create a list object with the required number of elements, and
assign the objects to the list instead.
This is all to do with
the automatic Restore Data Objects. If you create a brand new chapter with undo specifically turned off
it won’t do it, but as soon as you re-open the chapter it starts restoring
objects again.
Personally I don’t like
the data restore feature. Early versions of S+ didn’t have it, and you just got
used to saving scripts and backing up important objects (which is good
analysis/programming practice). When S+ became a “Windows” application we got
the undo and a pile of other useless features.
Michael
|

|
|
|
|
MICHAEL
CAMILLERI BSc, MSc, PhD |
|
BUILDING
PHYSICIST |
|
|
|
PRIVATE
BAG 50908 |
|
PORIRUA
CITY
5240 |
|
WWW.BRANZ.CO.NZ |
From: Herschtal
Alan [mailto:Alan.Herschtal@petermac.org] Sent: Monday, 4 June 2007 1:24
p.m. To:
s-news@lists.biostat.wustl.edu Subject: Time management in
S-Plus
I
am trying to run many iterations of a simulation in S-plus, and am finding that
the time taken per iteration increases almost exponentially. I have tried to
write the code as efficiently as I can, with minimal use of for loops, vector
arithmetic wherever possible, and reusing datasets where feasible. I understand
that this phenomenon has something to do with the paging in S-plus's memory
management system. Is there any way to keep the execution time constant
?
Thanks,
Alan
Herschtal Biostatistician
Centre
for Biostatistics and Clinical Trials Peter
MacCallum Cancer Centre Ph:
9656 3639
|