s-news
[Top] [All Lists]

Re: documentation for 'minimal' option of 'sample' command

To: "Christopher G. Green (L)" <cggreen@u.washington.edu>
Subject: Re: documentation for 'minimal' option of 'sample' command
From: Tim Hesterberg <timh@insightful.com>
Date: Tue, 19 Feb 2008 20:30:01 -0800
Cc: <s-news@lists.biostat.wustl.edu>
In-reply-to: <002d01c87376$87100b90$9602a8c0@cgglappie> (cggreen@u.washington.edu)
References: <002d01c87376$87100b90$9602a8c0@cgglappie>
>Does any one know of a reference that explains the algorithm used for the
>'minimal' option to the 'sample' command in S-Plus 7 and higher? The help
>page for 'sample' does not give a reference; nor does the source code for
>'sample.default'.

First a quick plug - sampling with minimal replacement would be useful
in many situations that people currently use sampling with replacement,
for example in SIR (sampling importance-resampling), to convert
a set of observations with unequal weights to a set with equal weights.


In the case of sampling with equal probabilities:

If 'minimal=TRUE', then
sampling is with minimal replacement--if 'size>n' then
each observation is included 'size %/% n' times,
then the remaining draws taken without replacement.

In the case of unequal probabilities:

If 'minimal=TRUE', then the probability of selecting observation 'j'
equals 'size*j', and every permutation of a set of outcomes has the
same probability of being chosen (assuming 'order=T').  The algorithm
randomly permutes the values and corresponding probabilities, divides
the unit interval into blocks of length proportional to 'prob', does a
systematic sample of 'size' random numbers uniformly on the unit
interval, selects the observations for which the uniform numbers fall
into the corresponding blocks, then does a final random permutation
(if 'order=TRUE').  "Minimal replacement" is not taken literally in
the unequal probability case - if 'size*max(prob)>1' than duplicates
may occur, and if '>2' then duplicates are guaranteed.

The above text is from the help file for the next version of S-PLUS.
If you have suggestions for improvement please let me know.

Tim Hesterberg

========================================================
| Tim Hesterberg       Senior Research Scientist       |
| timh@insightful.com  Insightful Corp.                |
| (206)802-2319        1700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|                      www.insightful.com/Hesterberg   |
========================================================
Download S+Resample from www.insightful.com/downloads/libraries

I'll teach short courses:
  Advanced Programming in S-PLUS: San Antonio TX, March 26-27, 2008.
  Bootstrap Methods and Permutation Tests: San Antonio, March 28, 2008.
Links for more info are at www.insightful.com/Hesterberg

<Prev in Thread] Current Thread [Next in Thread>