s-news
[Top] [All Lists]

Re: [S] all possible combinations - summary

To: "Ken Banks" <kbanks@chickasaw.com>, <s-news@wubios.wustl.edu>
Subject: Re: [S] all possible combinations - summary
From: Bjarke Klein <bjarke@statdem.sdu.dk>
Date: Thu, 21 Sep 2000 10:52:39 +0200
In-reply-to: <NDBBLNLBDJNJNMDECCFCIEBJCDAA.kbanks@chickasaw.com>
Sender: owner-s-news@wubios.wustl.edu
Dear all,

Regarding my solution to the 'all possible combinations question'

The function dec2bin didn't do what it was supposed to do:

Here is the correct version:

dec2bin <- function(number, n)
{
        out <- rep(0,n)
        for (i in (n-1):0)
        {
                if (number >= 2^i) 
                        {
                                out[i+1] <- 1
                                number <- number-2^i
                        }
        }
out
}

Sorry about that

Bjarke



At 08:45 20-09-00 -0700, Ken Banks wrote:
>
>Dear All,
>
>Here is a summary of my post on 9-19-00 concerning all possible combinations
>of a 2^20 binary variable dataset.  The original question was:
>
>
><<I need to produce a dataset containing all possible combinations of the on
>or off states of 20 binary variables.   This should produce 2^20 or
>1,048,576 possible combinations.  Does anyone have any hints for an
>efficient way to accomplish this task?  Is it possible using SPlus?  Any
>hints are greatly appreciated!>>
>
>
>Many thanks to John Gavin, Patrick Connolly, Nick Ellis, Tim Hesterberg,
>Charles Berry, Michael Ernst, Greg Snow, Peter Wolf, Scott Chasalow, Andy
>Liaw, Ronald Fehd, Matt Calder, Dennis Murphy, Ursula Becker, Bert Gunter,
>Daniel Heitjan, S.D. Beyers, and Bjarke Klein.  I greatly appreciate all
>taking the time to reply to my question.  Memory is still a problem for most
>solutions, which is related in part to my whimpy computer.  Although I
>haven't produced the full dataset, the looping solution of Bjarke Klein may
>solve this problem.
>
>Thanks again!
>
>-Ken Banks.
>
>
>>From John Gavin,
>
>states <- function(n) expand.grid( lapply(seq(n), function(x) 0:1) )
>
>seems to work but seems very memory ineffecient.
>
>resources(ans <- states(20))
>> resources(ans <- states(20))
> secs   working cache
> 6.65 190725264  9492
>
>
>
>>From Bjarke Klein
>Pick a random integer between 0 and 2^20 (both included)
>Convert this number to binary representation
>This will correspond to a series of 20 binary observations.
>
>I do not know whether this solves your problem
>
>I have included a code-example:
>
># Function to change from radix 10 to radix 2
># n must be the minimum n such that 2^n>number
>dec2bin <- function(number, n)
>{
>       out <- rep(0,n)
>       for (i in n:1)
>       {
>               if (number >= 2^i)
>                       {
>                               out[i] <- 1
>                               number <- number-2^i
>                       }
>       }
>out
>}
>
># example:
>
># number corresponds to a combination
>number <- round(runif(1,0,2^20),0)
>
>number
>
>dec2bin(number,20)
>
>In this way you can generate the entire data set by looping through the
>numbers from 0 to 2^20
>
>from Patrick Connolly
>Just a hint.  Try expand.grid().
>
>from Nick Ellis
>One way is with expand.grid, something like this:
>
>expand.grid(v1=0:1, v2=0:1, v3=0:1, <and so on until>, v20=0:1)
>
>This gives a dataframe with 20 columns and 2^20 rows.
>
>Another way is to calculate each of the 20 variables from the corresponding
>bit of the sequence 0:(2^20 - 1), something like this:
>
>bigseq <- 0:(2^20-1)
>data.frame(v1=bigseq %% 2, v2=(bigseq %/% 2) %% 2, v3=(bigseq %/% 4) %% 2,
><and so on until>, v20=(bigseq %/% (2^19)) %% 2)
>
>Of course there are slicker ways of doing this without writing things out 20
>times, e.g. (first method)
>
>n <- 20
>arglist <- vector("list",n)
>names(arglist) <- paste("v",1:n,sep="")
>for (i in 1:n) arglist[[i]] <- 0:1
>result <- do.call("expand.grid",arglist)
>
>and (second method)
>
>n <- 20
>bigseq <- 0:(2^n-1)
>bigmat <- matrix(bigseq,nrow=2^n,ncol=n)
>bigdiv <- matrix(2^(1:n - 1),nrow=2^n,ncol=n,byrow=T)
>result <- (bigmat %/% bigdiv) %% 2
>dimnames(result) <- list(NULL,paste("v",1:n,sep=""))
>
>>From Tim Hesterberg
>Here's an function to produce base-2 representations.
>You would use it as:
>       base2(0:(2^19-1))
>
>base2 <- function(m)
>{
>  ##Express m as powers of 2; m may be vector
>  n <- length(m)
>  maxi <- floor(log(max(m)/log(2)))
>  result <- matrix( 0, n, maxi+1 )
>  for(i in 1:(maxi+1)){
>    result[,i] <- m %% 2
>    m <- m %/% 2
>  }
>  result
>}
>
>Another approach uses cbind() and rep(..., each=2):
>
>y <- matrix(0:1,2)
>for(i in 1:5)
>  y <- cbind(rbind(y,y), rep(0:1, each=2^i))
>y
>
>>From Charles Berry
>bits.20 <- outer(seq(0,length=20^2),2^(19:0),function(x,y) x %/% y %% 2)
>
>
>>From Michael Ernst
>The following function, from page 147 of the 3rd edition of VR, returns
>a matrix with each column containing the binary representation of each
>number in the vector "x" with "digits" number of digits.
>
>binary.v <-
>function(x, digits)
>{
>        if(missing(digits)) {
>                mx <- max(x)
>                digits <- if(mx > 0) 1 + floor(log(mx, base = 2)) else 1
>        }
>        ans <- 0:(digits - 1)
>        lx <- length(x)
>        x <- rep(x, rep(digits, lx))
>        x <- (x %/% 2^ans) %% 2
>        dim(x) <- c(digits, lx)
>        x
>}
>
>And binary.v(0:1048575,20) would give you what you want.
>
>
>>From Greg Snow
>try:
>
>temp <- list(A=0:1)
>for (i in 2:20) temp[[ LETTERS[i] ]] <- 0:1
>temp2 <- expand.grid( temp )
>dim(temp2)
>
>don't expect it to be fast, and you had better have a lot of memory.
>
>>From Peter Wolf
>
>here is an old solution to get the 2^n patterns as rows of a matrix
>
>comb<-function(n){
> comb<-NULL
> for( i in 1:n) comb<-rbind(cbind(1,comb),cbind(0,comb))
> comb
>}
>
>>From Scott Chasalow
>The following might work for you, though 2^20 is pushing it.
>
>Under Splus 4.5 for Windows, running under Win NT4 on a Pentium II 300Mhz
>with 128MB RAM:
>
>> hcube(rep(2, 3), translation = -1)
>     [,1] [,2] [,3]
>[1,]    0    0    0
>[2,]    1    0    0
>[3,]    0    1    0
>[4,]    1    1    0
>[5,]    0    0    1
>[6,]    1    0    1
>[7,]    0    1    1
>[8,]    1    1    1
>
>> options(object.size = 20000000)
>> dos.time(hcube(rep(2, 2), translation = -1))
>[1] 0.01998901
>> dos.time(hcube(rep(2, 4), translation = -1))
>[1] 0.02999878
>> dos.time(hcube(rep(2, 8), translation = -1))
>[1] 0.04000854
>> dos.time(hcube(rep(2, 12), translation = -1))
>[1] 0.1400146
>> dos.time(hcube(rep(2, 14), translation = -1))
>[1] 0.5299988
>> dos.time(hcube(rep(2, 16), translation = -1))
>[1] 2.279999
>> dos.time(hcube(rep(2, 17), translation = -1))
>[1] 18.56
>> dos.time(hcube(rep(2, 18), translation = -1))
>Error in data[1:ll] <- old: Cannot allocate 37748960 bytes: options("object.
>size") is 20000000: see options help file
>Dumped
>Timing stopped at: 2.84
>
>It appears that my machine needs to start swapping at 2^17, so things really
>slow down.  I think I'd have serious problems with a 2^20 x 20 matrix; I
>don't even want to try it.  But on a fast machine with lots of RAM, it might
>be doable in Splus.  Maybe hcube() could be made less memory-hungry too.
>
>If I HAD to do this in Splus, and I didn't need the whole matrix at once, I
>might be inclined to work with it in pieces, e.g. by cbinding hcube(rep(2,
>16), translation = -1) to the appropriate submatrices of the last four
>columns (which can be generated by replicating rows of hcube(rep(2, 4),
>translation = -1)).
>
>You can find the hcube() function in my combinat module,
>
>http://www.spg.wau.nl/pv/pub/chasalow/S/win/combinat
>
>or
>
>http://www.spg.wau.nl/pv/pub/chasalow/S/unix/combinat
>
>>From Andy Liaw
>
>Try expand.grid(x1,x2,x3,...), where x1, x2, ... are, for example, c(0,1).
>
>>From Ronald Fehd
>Here is a SAS program that will create your data set.
>
>%macro makedata(N);
>data B&N.;
>length default = 4;
>%DO I = 1 %TO &N.;
> do B&I = 0,1;    %END;
>output;
>%DO I = 1 %TO &N.;
> end;    %END;
>proc CONTENTS data = B&N.;
>run; %MEND;
>%MAKEDATA(3);
>
>>From Matt Calder
>
>If you use C this is easy, just test bits in the integers 0 - (2^21-1).
>Depending on the particulars of your problem, a simple .C call could give
>you
>what you want. Keep in mind that iterating over 2^21 values in S is going to
>be slow.
>
>>From Dennis Murphy
>Here's a version of Bill Venables' subsets program, which does what you're
>looking for:
>
>Subsets <- function(r, n, v = 1:n)
>  ## S3/S4 version.  Generate subsets, caching results in frame 0
>  if (r <= 0) vector(mode(v), 0) else
>  if (r >= n) v[1:n] else {
>    if(r > 1) {
>      i1 <- paste(".", n-1, r-1, sep="/")
>      i2 <- paste(".", n-1, r,   sep="/")
>      if(!exists(i1, frame = 0))
>        assign(i1, as.vector(Subsets(r-1, n-1, 1:(n-1))), frame = 0)
>      if(!exists(i2, frame = 0))
>        assign(i2, as.vector(Subsets(r,   n-1, 1:(n-1))), frame = 0)
>      m1 <- matrix(v[-1][get(i1, frame = 0)], ncol = r-1)
>      m2 <- matrix(v[-1][get(i2, frame = 0)], ncol = r)
>    } else {
>      m1 <- NULL
>      m2 <- matrix(v[2:n], ncol = 1)
>  }
>    rbind(cbind(v[1], m1), m2)
>  }
>
>It's possible that swap space and time may be issues with this function
>given the number of pairs requested.
>
>>From Ursula Becker,
>the function fac.design should do this.
>assign
>fnames_list(one=c("0","1"), two=..., twenty=c("0","1")), then
>test_fac.design(rep(2,20),factor.names=fnames)
>
>Or if you don't care about the column names:
>fnames_rep(list(one=c("0","1")),20)
>
>
>>From Bert Gunter,
>Look in the S-Plus archives for Bill Venables' subsets() (or subset() ? )
>function. Or use mine (below). Finding all possible subsets is the same as
>producing all binary sequences. Note that both Bill's and my solutions use
>recursion, which is a very natural way to do this, but sometimes slow
>because it is not very efficiently implemented in S-Plus. For my
>application, this made no difference.
>
>>From Daniel Hietjan
>I would do it in Fortran.  It will be many times faster.
>
>
>>From S.D. Beyers
>temp <- fac.design(levels=rep(2,20))
>
>
>will give you that matrix but you are throwing about a matrix as large as
>you create. A matrix 1M by 20 will always be big in Splus, whether you
>generate it in the fly or in one go.
>
>simplified form can be got thus
>
>result  <- matrix(unlist(temp)-1, ncol=20, byrow=F)
>
>The above commands work on my machine and if nothing else are effient
>in terms of programming.
>
>
>-----------------------------------------------------------------------
>This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
>send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
>message:  unsubscribe s-news
>
Bjarke Klein, Cand.Scient., Ph.D.-stud. 
Department of Statistics and Demography
University of Southern Denmark
Phone: +45 65 50 36 07  
http://www.statdem.sdu.dk/~bkl/
bjarke@statdem.sdu.dk 



-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>