s-news
[Top] [All Lists]

Problem with sample()

To: "S-News List" <s-news@lists.biostat.wustl.edu>
Subject: Problem with sample()
From: John Fennick <jhf2@adelphia.net>
Date: Sun, 20 Nov 2005 15:07:54 -0500

Hello Group
(I also sent this to S-Plus support, but I wonder is this an oldie or has anyone else experienced it)

Using S version 7.2 under Win 2000.
I am assuming that sample() and rsample() select a ?truly? random sample from the given population. If this is not true, then my question is mute.
It is this: If I generate a binomial sequence using sample() or rsample(), and compare the sample variance with the theoretical, or with sequences generated using rbinom() or runif(), I have an incorrect result. The latter two are consistent and agree with theory, binomial variance = Npq.
The means of all three methods used for the sequences agree and are correct, Np.
Is there a problem with sample() or what am I missing?

Thanks,
john

EXAMPLE
> dim(tt)
[1] 2048  100
Ø        # tt[1,] <- c(rep(0,70),rep(1,30)) is repeated in 2048 rows.
Ø        # Each row represents a sequence with probability of 0.30 of a ?1? appearing,
Ø        # just need to randomize it for testing.
Ø        # For a seq. of length 30, the binomial variance should be Npq = 30 * 0.3 * 0.7 = 0.63
>                # Use tt to create binomial sequences.
Ø        # First, generate a binom. seq. using sample()
Ø        
> zz <- apply(tt,1,function(x){ sum(sample(x,30)) })
> length(zz)    # check sample size
[1] 2048
> var(zz)       # What is the variance?
[1] 4.420762    # ?wrong?

>                # Next generate a formal binom. seq. of same size using rbinom()
> xx <- (rbinom(2048,30,0.3))
> var(xx)       # What is the variance?
[1] 6.310369    # ?correct?

>                # Now generate binom. seq. using runif()
>       yy <- apply(tt,1,function(x){ sum(x[rnd(runif(30,0.5,100.5),0)]) })
> var(yy)       # What is the variance?
[1] 6.322804    # ?correct?

>                # Summary:
> var(zz)
[1] 4.420762
> var(xx)
[1] 6.310369
> var(yy)
[1] 6.322804
# Npq = 30 * 0.3 * 0.7 = 0.63

These results repeat using rsample() instead of sample() for zz and for samples up to several thousand.

John Fennick
jhf2@adelphia.net
alt: j.fennick@ieee.org
Tel: 603.526.4023
134 Brookside Drive
New London, NH 03257
USA

<Prev in Thread] Current Thread [Next in Thread>