Hello there,
I am running :
S-PLUS : Copyright (c) 1988, 2007 Insightful Corp.
S : Copyright Insightful Corp.
Version 8.0.4 for Linux 2.4.21-37.EL, 64-bit : 2007
Running "summary.glm" twice on the output of "dglm" (a function by Gordon
Smyth for dispersion modeling) takes roughly 20 minutes?? I made a private
version of "summary.glm" trying to track down where the bottle neck is.
Here is what I found:
the instruction: "wt <- wt^0.5" is the one that's eating up all the time.
I then inserted a "return(wt)" statement in the function and played around
with this "named" vector. I "unnamed" the vector but the timing was
basically the same. I tracked the problem down to taking square root of
numbers very close to "1". To make a long story short I'll show you the
same command ran on the above machine, Linux 64-bit machine, and on a
32-bit Windows version:
--------------------------------------------------------------------------------
ttt.test <- sample(c(0.99999999999999978, 0.99999999999999978,
1.0000000000000002, 1.0000000000000002,
1.0000000000000002, 1.0000000000000002,
0.99999999999999978, 1.0000000000000002,
0.99999999999999978, 0.99999999999999978), size =
826015,
replace = T)
class(ttt.test)
[1] "numeric"
mysummary(ttt.test)
Min. 1st Qu. Median
9.9999999999999985e-03 9.9999999999999985e-03 9.9999999999999985e-03
Mean 3rd Qu. Max.
1.0000000000000000e-02 1.0000000000000002e-02 1.0000000000000002e-02
N Sum
8.2601500000000000e+05 8.2601500000000000e+05
length(ttt.test)
[1] 826015
resources(ttt.test.sqrt <- ttt.test^0.5)
User time = 0 h. 12 min. 39.060000000000172804 s.
System time = 0 h. 0 min. 0.019999999999999574 s.
CPU time = 0 h. 12 min. 39.080000000000154614 s.
Elapsed time = 0 h. 12 min. 39.860000000000582077 s.
Child = 0 h. 0 min. 0.000000000000000000 s.
% CPU = 99.9
Memory usage:
Cache = 0 Bytes
Working = 13.217104M Bytes
--------------------------------------------------------------------------------
Allmost 13 minutes to take 826015 square roots of numbers near 1?
Now on the 32-bit Windows machine.
S-PLUS : Copyright (c) 1988, 2007 Insightful Corp.
S : Copyright Insightful Corp.
Enterprise Developer Version 8.0.4 for Microsoft Windows : 2007
ttt.test <- sample(c(0.99999999999999978, 0.99999999999999978,
1.0000000000000002, 1.0000000000000002,
1.0000000000000002, 1.0000000000000002,
0.99999999999999978, 1.0000000000000002,
0.99999999999999978, 0.99999999999999978), size =
826015,
replace = T)
class(ttt.test)
[1] "numeric"
options(digits = 17)
mysummary(ttt.test)
Min. 1st Qu. Median Mean
3rd Qu.
0.99999999999999978 0.99999999999999978 1.0000000000000002 1
1.0000000000000002
Max. N Sum
1.0000000000000002 826015 8.2601500000000000e+05
length(ttt.test)
[1] 826015
resources(ttt.test.sqrt <- ttt.test^0.5)
User time = 0 h. 0 min. 0.260999999999999900 s.
System time = 0 h. 0 min. 0.010000000000000009 s.
CPU time = 0 h. 0 min. 0.270999999999999910 s.
Elapsed time = 0 h. 0 min. 0.271000000000015010 s.
Child = 0 h. 0 min. 0.000000000000000000 s.
% CPU = 100
Memory usage:
Cache = 0 Bytes
Working = 13.232343M Bytes
--------------------------------------------------------------------------------
It is only
(12 * 60 + 39.860000000000582077) / 0.271000000000015010
[1] 2803.9114391142380 times faster??? I know that a problem that fits in
32-bit will run faster in 32-bit than in 64-bit BUT I doubt very much that
it
explains such a difference? Am I missing something???
Thanks for any support,
Gérald Jean
Conseiller senior en statistiques, Actuariat
télephone : (418) 835-4900 poste (7639)
télecopieur : (418) 835-6657
courrier électronique: gerald.jean@dgag.ca
"In God we trust, all others must bring data" W. Edwards Deming
Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés
uniquement aux personnes identifiées et peuvent contenir des informations
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez
reçu ce message par erreur, veuillez le détruire.
This communication (and/or the attachments) is intended for named
recipients only and may contain privileged or confidential information
which is not to be disclosed. If you received this communication by mistake
please destroy all copies.
|