s-news
[Top] [All Lists]

[S] all possible combinations - summary

To: <s-news@wubios.wustl.edu>
Subject: [S] all possible combinations - summary
From: "Ken Banks" <kbanks@chickasaw.com>
Date: Wed, 20 Sep 2000 08:45:00 -0700
Importance: Normal
Sender: owner-s-news@wubios.wustl.edu
Dear All,

Here is a summary of my post on 9-19-00 concerning all possible combinations
of a 2^20 binary variable dataset.  The original question was:


<<I need to produce a dataset containing all possible combinations of the on
or off states of 20 binary variables.   This should produce 2^20 or
1,048,576 possible combinations.  Does anyone have any hints for an
efficient way to accomplish this task?  Is it possible using SPlus?  Any
hints are greatly appreciated!>>


Many thanks to John Gavin, Patrick Connolly, Nick Ellis, Tim Hesterberg,
Charles Berry, Michael Ernst, Greg Snow, Peter Wolf, Scott Chasalow, Andy
Liaw, Ronald Fehd, Matt Calder, Dennis Murphy, Ursula Becker, Bert Gunter,
Daniel Heitjan, S.D. Beyers, and Bjarke Klein.  I greatly appreciate all
taking the time to reply to my question.  Memory is still a problem for most
solutions, which is related in part to my whimpy computer.  Although I
haven't produced the full dataset, the looping solution of Bjarke Klein may
solve this problem.

Thanks again!

-Ken Banks.


>From John Gavin,

states  <- function(n) expand.grid( lapply(seq(n), function(x) 0:1) )

seems to work but seems very memory ineffecient.

resources(ans <- states(20))
> resources(ans <- states(20))
 secs   working cache
 6.65 190725264  9492



>From Bjarke Klein
Pick a random integer between 0 and 2^20 (both included)
Convert this number to binary representation
This will correspond to a series of 20 binary observations.

I do not know whether this solves your problem

I have included a code-example:

# Function to change from radix 10 to radix 2
# n must be the minimum n such that 2^n>number
dec2bin <- function(number, n)
{
        out <- rep(0,n)
        for (i in n:1)
        {
                if (number >= 2^i)
                        {
                                out[i] <- 1
                                number <- number-2^i
                        }
        }
out
}

# example:

# number corresponds to a combination
number <- round(runif(1,0,2^20),0)

number

dec2bin(number,20)

In this way you can generate the entire data set by looping through the
numbers from 0 to 2^20

from Patrick Connolly
Just a hint.  Try expand.grid().

from Nick Ellis
One way is with expand.grid, something like this:

expand.grid(v1=0:1, v2=0:1, v3=0:1, <and so on until>, v20=0:1)

This gives a dataframe with 20 columns and 2^20 rows.

Another way is to calculate each of the 20 variables from the corresponding
bit of the sequence 0:(2^20 - 1), something like this:

bigseq <- 0:(2^20-1)
data.frame(v1=bigseq %% 2, v2=(bigseq %/% 2) %% 2, v3=(bigseq %/% 4) %% 2,
<and so on until>, v20=(bigseq %/% (2^19)) %% 2)

Of course there are slicker ways of doing this without writing things out 20
times, e.g. (first method)

n <- 20
arglist <- vector("list",n)
names(arglist) <- paste("v",1:n,sep="")
for (i in 1:n) arglist[[i]] <- 0:1
result <- do.call("expand.grid",arglist)

and (second method)

n <- 20
bigseq <- 0:(2^n-1)
bigmat <- matrix(bigseq,nrow=2^n,ncol=n)
bigdiv <- matrix(2^(1:n - 1),nrow=2^n,ncol=n,byrow=T)
result <- (bigmat %/% bigdiv) %% 2
dimnames(result) <- list(NULL,paste("v",1:n,sep=""))

>From Tim Hesterberg
Here's an function to produce base-2 representations.
You would use it as:
        base2(0:(2^19-1))

base2 <- function(m)
{
  ##Express m as powers of 2; m may be vector
  n <- length(m)
  maxi <- floor(log(max(m)/log(2)))
  result <- matrix( 0, n, maxi+1 )
  for(i in 1:(maxi+1)){
    result[,i] <- m %% 2
    m <- m %/% 2
  }
  result
}

Another approach uses cbind() and rep(..., each=2):

y <- matrix(0:1,2)
for(i in 1:5)
  y <- cbind(rbind(y,y), rep(0:1, each=2^i))
y

>From Charles Berry
bits.20 <- outer(seq(0,length=20^2),2^(19:0),function(x,y) x %/% y %% 2)


>From Michael Ernst
The following function, from page 147 of the 3rd edition of VR, returns
a matrix with each column containing the binary representation of each
number in the vector "x" with "digits" number of digits.

binary.v <-
function(x, digits)
{
        if(missing(digits)) {
                mx <- max(x)
                digits <- if(mx > 0) 1 + floor(log(mx, base = 2)) else 1
        }
        ans <- 0:(digits - 1)
        lx <- length(x)
        x <- rep(x, rep(digits, lx))
        x <- (x %/% 2^ans) %% 2
        dim(x) <- c(digits, lx)
        x
}

And binary.v(0:1048575,20) would give you what you want.


>From Greg Snow
try:

temp <- list(A=0:1)
for (i in 2:20) temp[[ LETTERS[i] ]] <- 0:1
temp2 <- expand.grid( temp )
dim(temp2)

don't expect it to be fast, and you had better have a lot of memory.

>From Peter Wolf

here is an old solution to get the 2^n patterns as rows of a matrix

comb<-function(n){
 comb<-NULL
 for( i in 1:n) comb<-rbind(cbind(1,comb),cbind(0,comb))
 comb
}

>From Scott Chasalow
The following might work for you, though 2^20 is pushing it.

Under Splus 4.5 for Windows, running under Win NT4 on a Pentium II 300Mhz
with 128MB RAM:

> hcube(rep(2, 3), translation = -1)
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    1    0    0
[3,]    0    1    0
[4,]    1    1    0
[5,]    0    0    1
[6,]    1    0    1
[7,]    0    1    1
[8,]    1    1    1

> options(object.size = 20000000)
> dos.time(hcube(rep(2, 2), translation = -1))
[1] 0.01998901
> dos.time(hcube(rep(2, 4), translation = -1))
[1] 0.02999878
> dos.time(hcube(rep(2, 8), translation = -1))
[1] 0.04000854
> dos.time(hcube(rep(2, 12), translation = -1))
[1] 0.1400146
> dos.time(hcube(rep(2, 14), translation = -1))
[1] 0.5299988
> dos.time(hcube(rep(2, 16), translation = -1))
[1] 2.279999
> dos.time(hcube(rep(2, 17), translation = -1))
[1] 18.56
> dos.time(hcube(rep(2, 18), translation = -1))
Error in data[1:ll] <- old: Cannot allocate 37748960 bytes: options("object.
size") is 20000000: see options help file
Dumped
Timing stopped at: 2.84

It appears that my machine needs to start swapping at 2^17, so things really
slow down.  I think I'd have serious problems with a 2^20 x 20 matrix; I
don't even want to try it.  But on a fast machine with lots of RAM, it might
be doable in Splus.  Maybe hcube() could be made less memory-hungry too.

If I HAD to do this in Splus, and I didn't need the whole matrix at once, I
might be inclined to work with it in pieces, e.g. by cbinding hcube(rep(2,
16), translation = -1) to the appropriate submatrices of the last four
columns (which can be generated by replicating rows of hcube(rep(2, 4),
translation = -1)).

You can find the hcube() function in my combinat module,

http://www.spg.wau.nl/pv/pub/chasalow/S/win/combinat

or

http://www.spg.wau.nl/pv/pub/chasalow/S/unix/combinat

>From Andy Liaw

Try expand.grid(x1,x2,x3,...), where x1, x2, ... are, for example, c(0,1).

>From Ronald Fehd
Here is a SAS program that will create your data set.

%macro makedata(N);
data B&N.;
length default = 4;
%DO I = 1 %TO &N.;
 do B&I = 0,1;    %END;
output;
%DO I = 1 %TO &N.;
 end;    %END;
proc CONTENTS data = B&N.;
run; %MEND;
%MAKEDATA(3);

>From Matt Calder

If you use C this is easy, just test bits in the integers 0 - (2^21-1).
Depending on the particulars of your problem, a simple .C call could give
you
what you want. Keep in mind that iterating over 2^21 values in S is going to
be slow.

>From Dennis Murphy
Here's a version of Bill Venables' subsets program, which does what you're
looking for:

Subsets <- function(r, n, v = 1:n)
  ## S3/S4 version.  Generate subsets, caching results in frame 0
  if (r <= 0) vector(mode(v), 0) else
  if (r >= n) v[1:n] else {
    if(r > 1) {
      i1 <- paste(".", n-1, r-1, sep="/")
      i2 <- paste(".", n-1, r,   sep="/")
      if(!exists(i1, frame = 0))
        assign(i1, as.vector(Subsets(r-1, n-1, 1:(n-1))), frame = 0)
      if(!exists(i2, frame = 0))
        assign(i2, as.vector(Subsets(r,   n-1, 1:(n-1))), frame = 0)
      m1 <- matrix(v[-1][get(i1, frame = 0)], ncol = r-1)
      m2 <- matrix(v[-1][get(i2, frame = 0)], ncol = r)
    } else {
      m1 <- NULL
      m2 <- matrix(v[2:n], ncol = 1)
  }
    rbind(cbind(v[1], m1), m2)
  }

It's possible that swap space and time may be issues with this function
given the number of pairs requested.

>From Ursula Becker,
the function fac.design should do this.
assign
fnames_list(one=c("0","1"), two=..., twenty=c("0","1")), then
test_fac.design(rep(2,20),factor.names=fnames)

Or if you don't care about the column names:
fnames_rep(list(one=c("0","1")),20)


>From Bert Gunter,
Look in the S-Plus archives for Bill Venables' subsets() (or subset() ? )
function. Or use mine (below). Finding all possible subsets is the same as
producing all binary sequences. Note that both Bill's and my solutions use
recursion, which is a very natural way to do this, but sometimes slow
because it is not very efficiently implemented in S-Plus. For my
application, this made no difference.

>From Daniel Hietjan
I would do it in Fortran.  It will be many times faster.


>From S.D. Beyers
temp <- fac.design(levels=rep(2,20))


will give you that matrix but you are throwing about a matrix as large as
you create. A matrix 1M by 20 will always be big in Splus, whether you
generate it in the fly or in one go.

simplified form can be got thus

result  <- matrix(unlist(temp)-1, ncol=20, byrow=F)

The above commands work on my machine and if nothing else are effient
in terms of programming.


-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>