Jean V Adams wrote:
I'm writing a function to carry out a simulation, and I'd like to try and
speed it up. Most parts of the function run fast (random number
generation, arithmetic operations, all array operations, no loops) until I
get near the end, where I conduct about 100,000 t-tests inside of a for()
loop. Here's a simplified example with two three-dimensional arrays
(4x3x2) and one t-test in a for() loop.
# data
dims <- c(4, 3, 2)
a <- array(rnorm(prod(dims)), dim=dims)
b <- array(rnorm(prod(dims)), dim=dims)
# for loop
m <- array(NA, dim=dims[1:2])
for(i in 1:dims[1]) {
for(j in 1:dims[2]) {
m[i, j] <- t.test(a[i, j, ], b[i, j, ])$p.value<0.05
}}
In reality, I have six three-dimensional arrays (1000x20x50) and six
t-tests in a for() loop. It took me 3.5 minutes to run the above example
setting dims to c(1000, 20, 50). I would appreciate any hints as to how to
speed this up.
I'm using S-PLUS® 6.2 for Windows PROFESSIONAL EDITION with XP operating
system.
JVA
You can build the t-test directly from the means & variance using
rowMeans and rowVars:
set.seed(1)
dims <- c(100, 3, 2)
a <- array(rnorm(prod(dims)), dim=dims)
b <- array(rnorm(prod(dims)), dim=dims)
# for loop
t1 <- proc.time()
mt <- mp <- array(NA, dim=dims[1:2])
for(i in 1:dims[1]) {
for(j in 1:dims[2]) {
tt <- t.test(a[i, j, ], b[i, j, ])
mt[i, j] <- tt$statistic
mp[i, j] <- tt$p.value
}
}
t2 <- proc.time()
t3 <- proc.time()
ma <- rowMeans(a, dims = 2)
mb <- rowMeans(b, dims = 2)
ssa <- rowVars(a, SumSquares = TRUE, dims = 2)
ssb <- rowVars(b, SumSquares = TRUE, dims = 2)
s <- sqrt((ssa + ssb)/(2 * dims[3] - 2))
tarr <- (ma - mb)/s
ptarr <- 2 * (1 - pt(abs(tarr), 2 * dims[3] - 2))
ptarr <- array(ptarr, dim = dims[1:2])
t4 <- proc.time()
|