s-news
[Top] [All Lists]

Re: Generating a vector dynamically

To: Tony Plate <tplate@blackmesacapital.com>
Subject: Re: Generating a vector dynamically
From: Tim Hesterberg <timh@insightful.com>
Date: 21 Apr 2006 11:20:13 -0700
Cc: David L Lorenz <lorenz@usgs.gov>, "Khan, Sohail" <khan@cshl.edu>, s-news@lists.biostat.wustl.edu
In-reply-to: <4448F5A2.6040607@blackmesacapital.com> (message from Tony Plate on Fri, 21 Apr 2006 09:09:22 -0600)
References: <OF5C1A9CAF.9AECD949-ON86257157.004475F0-86257157.0044B178@usgs.gov> <4448F5A2.6040607@blackmesacapital.com>
Note that you do not need to turn a vector into a matrix before
calling colSums; it is faster to use the "n" argument:

> set.seed(1)
> d <- data.frame(x=sample(c(10:14,100),50,rep=T))
> sys.time(res <- sapply(1:10000, function(i, d) sum(sample(d$x, 10, rep=T)), 
> d))
[1] 10.751 10.829
> sys.time(res <- colSums(matrix(sample(d$x, 10*10000, rep=T), ncol=10000)))
[1] 0.094 0.125
> sys.time(res <- colSums(sample(d$x, 10*10000, rep=T), n=10))
[1] 0.062 0.078

Here's another option, no faster in this case, but this generalizes
to creating bootstrap sums for matrices as well as vectors:

> library(resample)
> sys.time(res <- bootstrapSums(d$x,
+                             samp.bootstrap(n=nrow(d), B=10000, size = 10)))
[1] 0.062 0.109

Tim Hesterberg

>It's easy to measure whether taking all the sample at once is more 
>efficient:
>
> > set.seed(1)
> > d <- data.frame(x=sample(c(10:14,100),50,rep=T))
> > sys.time(res <- sapply(1:10000, function(i, d) sum(sample(d$x, 10, 
>rep=T)), d))
>[1] 5.766 5.844
> > sys.time(res <- colSums(matrix(sample(d$x, 10*10000, rep=T), 
>ncol=10000)))
>[1] 0.031 0.047
> >
>
>(Here I took the sum of 10,000 samples rather than 1,000 in order to get 
>measurable times.)
>
>In general, I find it very worthwhile to measure things that "should" be 
>faster in S-PLUS, because these things can sometimes be quite surprising.
>
>-- Tony Plate
>
>David L Lorenz wrote:
>> 
>> Hi,
>>   It is probably a little more efficient to take all of the samples and 
>> then process them. So another way:
>> 
>> res <- colSums(matrix(sample(d$x, 10*1000, rep=T), ncol=1000))
>> 
>> Dave
>> 
>> 
>> *Tony Plate <tplate@blackmesacapital.com>*
>> Sent by: s-news-owner@lists.biostat.wustl.edu
>> 
>> 04/20/2006 07:13 PM
>> 
>>      
>> To
>>      "Khan, Sohail" <khan@cshl.edu>
>> cc
>>      s-news@lists.biostat.wustl.edu
>> Subject
>>      Re: [S] Generating a vector dynamically
>> 
>> Here's one way to do what I think you want:
>> 
>>  > set.seed(1)
>>  > # Generate a sample data frame with one column 'x'
>>  > d <- data.frame(x=sample(c(10:14,100),50,rep=T))
>>  > # Create a vector with the sum of 1000 samples of size
>>  > # 10 (with replacement)
>>  > res <- sapply(1:1000, function(i) sum(sample(d$x, 10, rep=T)))
>>  > hist(res)
>>  >
>> 
>> Khan, Sohail wrote:
>>  > Dear List
>>  >
>>  > I want to write a "little" piece of  code which would :
>>  >
>>  > -- take sample of 10 values (randomly) from a data.frame column
>>  > -- sum these values
>>  > -- put the values in a vector
>>  > -- repeat 1000 times
>>  >
>>  > I would later draw a histogram of this generated vector.
>>  > Thanks in advance for any advice/suggestions.
>>  >
>>  > Sohail Khan
>>  > Scientific Programmer
>>  > COLD SPRING HARBOR LABORATORY
>>  > Genome Research Center
>>  > 500 Sunnyside Boulevard
>>  > Woodbury, NY 11797
>>  > (516)422-4076


<Prev in Thread] Current Thread [Next in Thread>