Note that you do not need to turn a vector into a matrix before
calling colSums; it is faster to use the "n" argument:
> set.seed(1)
> d <- data.frame(x=sample(c(10:14,100),50,rep=T))
> sys.time(res <- sapply(1:10000, function(i, d) sum(sample(d$x, 10, rep=T)),
> d))
[1] 10.751 10.829
> sys.time(res <- colSums(matrix(sample(d$x, 10*10000, rep=T), ncol=10000)))
[1] 0.094 0.125
> sys.time(res <- colSums(sample(d$x, 10*10000, rep=T), n=10))
[1] 0.062 0.078
Here's another option, no faster in this case, but this generalizes
to creating bootstrap sums for matrices as well as vectors:
> library(resample)
> sys.time(res <- bootstrapSums(d$x,
+ samp.bootstrap(n=nrow(d), B=10000, size = 10)))
[1] 0.062 0.109
Tim Hesterberg
>It's easy to measure whether taking all the sample at once is more
>efficient:
>
> > set.seed(1)
> > d <- data.frame(x=sample(c(10:14,100),50,rep=T))
> > sys.time(res <- sapply(1:10000, function(i, d) sum(sample(d$x, 10,
>rep=T)), d))
>[1] 5.766 5.844
> > sys.time(res <- colSums(matrix(sample(d$x, 10*10000, rep=T),
>ncol=10000)))
>[1] 0.031 0.047
> >
>
>(Here I took the sum of 10,000 samples rather than 1,000 in order to get
>measurable times.)
>
>In general, I find it very worthwhile to measure things that "should" be
>faster in S-PLUS, because these things can sometimes be quite surprising.
>
>-- Tony Plate
>
>David L Lorenz wrote:
>>
>> Hi,
>> It is probably a little more efficient to take all of the samples and
>> then process them. So another way:
>>
>> res <- colSums(matrix(sample(d$x, 10*1000, rep=T), ncol=1000))
>>
>> Dave
>>
>>
>> *Tony Plate <tplate@blackmesacapital.com>*
>> Sent by: s-news-owner@lists.biostat.wustl.edu
>>
>> 04/20/2006 07:13 PM
>>
>>
>> To
>> "Khan, Sohail" <khan@cshl.edu>
>> cc
>> s-news@lists.biostat.wustl.edu
>> Subject
>> Re: [S] Generating a vector dynamically
>>
>> Here's one way to do what I think you want:
>>
>> > set.seed(1)
>> > # Generate a sample data frame with one column 'x'
>> > d <- data.frame(x=sample(c(10:14,100),50,rep=T))
>> > # Create a vector with the sum of 1000 samples of size
>> > # 10 (with replacement)
>> > res <- sapply(1:1000, function(i) sum(sample(d$x, 10, rep=T)))
>> > hist(res)
>> >
>>
>> Khan, Sohail wrote:
>> > Dear List
>> >
>> > I want to write a "little" piece of code which would :
>> >
>> > -- take sample of 10 values (randomly) from a data.frame column
>> > -- sum these values
>> > -- put the values in a vector
>> > -- repeat 1000 times
>> >
>> > I would later draw a histogram of this generated vector.
>> > Thanks in advance for any advice/suggestions.
>> >
>> > Sohail Khan
>> > Scientific Programmer
>> > COLD SPRING HARBOR LABORATORY
>> > Genome Research Center
>> > 500 Sunnyside Boulevard
>> > Woodbury, NY 11797
>> > (516)422-4076
|