s-news
[Top] [All Lists]

Re: [S] all possible combinations

To: "Ken Banks" <kbanks@chickasaw.com>
Subject: Re: [S] all possible combinations
From: Douglas Bates <bates@stat.wisc.edu>
Date: 20 Sep 2000 08:43:57 -0500
Cc: <s-news@wubios.wustl.edu>
In-reply-to: "Ken Banks"'s message of "Tue, 19 Sep 2000 08:21:10 -0700"
References: <NDBBLNLBDJNJNMDECCFCKEBBCDAA.kbanks@chickasaw.com>
Sender: owner-s-news@wubios.wustl.edu
User-agent: Gnus/5.0803 (Gnus v5.8.3) Emacs/20.7
"Ken Banks" <kbanks@chickasaw.com> writes:

> I need to produce a dataset containing all possible combinations of the on
> or off states of 20 binary variables.   This should produce 2^20 or
> 1,048,576 possible combinations.  Does anyone have any hints for an
> efficient way to accomplish this task?  Is it possible using SPlus?  Any
> hints are greatly appreciated!

I haven't seen an answer to this yet so I will venture a few comments.

First of all, you probably don't want to do that because you can't
conveniently store the result.  The smallest "atomic"
element stored in S-PLUS is an integer, which is 4 bytes on most
machines but 8 bytes on 64-bit machines like the Compaq Alpha's.  Even
if you were careful to ensure that the subsets you stored were
represented as integers, you would need 2^20 * 20 * 4 bytes or about
80 MB _per copy_.  I say per copy because you will often create copies
of data while working on the data.  You must know a lot about the
internals of individual functions to be able to circumvent this.

The alternative is to generate the subsets one at a time, do whatever
you need with the subset, and store the result.

A simple way of generating the possible subsets is to use the binary
representation of the numbers from 0 to 2^20-1.  The only trick is how
to get the binary representations.  In particular, it is best to avoid
using loops when doing this and to work on all the positions at once.

My best shot so far has been to pre-compute the powers of 2 and to use
the modulus operator and diff.

> pwrs <- 2^(0:20)
> for (i in 0:20) print(ifelse(diff(i %% pwrs), 1, 0)) # to show it works
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [1] 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The critical part is diff(i %% pwrs).  This will be a vector of length
20 with zero in the positions that are not in the subset and
non-zero values in those that are in the subset.  If you had a vector
called samp and you wanted all the possible subsets of samp you could
use

 for (i in 1:2^20) {
    subset <- samp[diff(i %% pwrs) != 0]
    ...
 }

Even then you would probably overflow the memory before you could get
through all the possible subsets.  You may be able to get away with
this in R, which does garbage collection.
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>