Normally, to do a permutation test for the correlation between x and y,
you would permute one of the variables, not pool the data.
E.g. if x and y are variables in a data frame, then:
> data <- data.frame(x = rnorm(100), y = rnorm(100))
> permutationTest(data, resampCor, resampleColumns = "x")
Call:
permutationTest(data = data, statistic = resampCor, resampleColumns = "x")
Number of Replications: 999
Summary Statistics:
Observed Mean SE alternative p-value
cor(x,y) -0.0564 0.004596 0.09524 two.sided 0.514
This permutes the "x" column, leaving "y" fixed. It is appropriate
for testing whether the relationship between x and y is significant,
against the alternative of independence.
In contrast, pooling the data would be appropriate for comparing two
samples, e.g. is a difference in means (or another statistic)
significantly different from zero, or is a ratio of means (or another
statistic) significantly different from one.
Tim Hesterberg
========================================================
| Tim Hesterberg Senior Research Scientist |
| timh@insightful.com Insightful Corp. |
| (206)802-2319 1700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. |
| www.insightful.com/Hesterberg |
========================================================
I'll teach short courses:
Bootstrap Methods and Permutation Tests
Oct 10-11 San Francisco
Advanced Programming in S-PLUS
Oct 8-9 San Francisco
http://www.insightful.com/services/training.asp
>Dear S-News Members!
>
>
>I´ve got a particular problem and I´m not sure if a permutationtest is the
>solution to solve it, though I´d intuitively use it.
>
>
>
>
>(1) If I´ve got two samples x= (x1, x2,..., xn) and y= (y1, y2,..., yn) (both
>have got the same sample-size n) which I´d like to put together to actually one
>big sample.
>(2) Then I´d like to draw "without replacement" n elements by chance from the
>created whole sample, building the first NEW sample.
>(3) All the elements left in the big sample, created before, which weren´t
>drawn, build automatically the second NEW sample.
>(4) Then I´d like to evuluate to statistic of interest - a
>correlation-coefficient.
>(5) If I repeat this procedure B=1000 times, then I should get an empirical
>estimate of the variability of the correlationcoefficient, which I´m interested
>in.
>
>
>To do the Steps (1) to (5) I´ve used the Permutationtest from the resample -
>Library. Here´s the syntax:
>> permutationTest(data = XY [c("x", "y")], statistic =
>> resampCor(data, resampleColumns = "x"), alternative =
>> "two.sided", resampleColumns = "x")
>
>
>My questions are:
>i) Is this the right syntax to do the steps (1) to (5)?
No, but I doubt you want to do that.
>ii) Furhtermore, if I use --> resampleColumns = "x", which is recommended in
>the S-Plus Resample Help to use for Correlations, how should I exactly
>understand this? -> Is there only the Column x permutated and the Column y left
>behind?
Yes.
>iii) Most important question: Exists there another, better solution to do (1)
>to
>(5)?
See above.
>
>Please, can anybody help me.
>Thank you!
>
>Yours sincerly,
>Weigl Klemens
|