s-news
[Top] [All Lists]

[S] Re. Query: combinations of groups

To: s-news@wubios.wustl.edu
Subject: [S] Re. Query: combinations of groups
From: Helen Clough <wilsonh@liverpool.ac.uk>
Date: Mon, 28 Feb 2000 11:51:38 +0000 (GMT Standard Time)
Sender: owner-s-news@wubios.wustl.edu
Dear all,

My original question was this:

> Suppose I have a set of n observations, and I want to 
> find out all possible unique ways in which I can split 
> these observations into r groups of equal size n/r.
 
> For example if I have n=4 observations {1, 2, 3, 4} and I 
> split these into r=2 groups I can have

> {1, 2} {3, 4}
> {1, 3} {2, 4}
> {1, 4} {2, 3}

> Can anyone suggest an efficient way I can do this in 
> Splus? I need something which can cope with any r and n 
> such that n/r is an integer. Every solution I have come 
> up with involves endless loops, which I realise is far 
> from ideal.

Thanks very much to Stephen Smith, Brian Ripley, Yuelin Li, 
Ian Clough, Renaud Lancelot and John Maindonald. The 
solution suggested by Professor Ripley concerning 
splitting the observations up into groups sequentially is 
probably the most relevant for my own purposes (since I am 
interested in all distinct *sets* of subsets, not simply 
all distinct *subsets* - I don't think I was clear enough 
about that in my original message; for example for the 
numbers 1,2,3,4 into two groups of size two, I would get 
three distinct possibilities, namely ({1,2},{3,4}), 
({1,3},{2,4}) and ({1,4},{2,3})). All the responses 
received have been useful along the way, so thanks to all. 

I've had a few requests from people to forward any 
solutions to the list, so they are included below:

-------------------------------------------------------------
See page 149 of Venables and Ripley (2nd ed) for the following function 

subsets<-function(r,n,v=1:n)
{
        if(r<=0) NULL else
        if(r>=n) v[1:n] else
        rbind(cbind(v[1],Recall(r-1,n-1,v[-1])), Recall(r, n-1,v[-1]))
}

e.g.,

> subsets(2,4)
     [,1] [,2] 
[1,]    1    2
[2,]    1    3
[3,]    1    4
[4,]    2    3
[5,]    2    4
[6,]    3    4
> 

(from Stephen Smith)
----------------------------------------------------------
Start with 2 groups, one of size k= n/r and one of size n-k, then
split off another k and so on.

For two groups, Bill Venables' subsets function does the trick.

subsets <- function(n, r, s = 1:n) {
  if(mode(n) != "numeric" || length(n) != 1
     || n < 1 || (n %% 1) != 0) stop("bad value of n")
  if(mode(r) != "numeric" || length(r) != 1
     || r < 1 || (r %% 1) != 0) stop("bad value of r")
  if(!is.atomic(s) || length(s) < n)
    stop("s is either non-atomic or too short")
  fun <- function(n, r, s)
    if(r <= 0) vector(mode(s), 0) else if(r >= n) s[1:n] else
    rbind(cbind(s[1], Recall(n - 1, r - 1, s[-1])),
          Recall(n - 1, r, s[-1]))
  fun(n, r, s)
}

Here n is n, r is k and s is the set of labels for the current group.

subsets(4, 2)
> subsets(4, 2)
     [,1] [,2] 
[1,]    1    2
[2,]    1    3
[3,]    1    4
[4,]    2    3
[5,]    2    4
[6,]    3    4

Since (I presume) you do not care about the group ordering you can always
put the minimum element in the first group, and thereby simplify this a bit.

(from Brian Ripley)
----------------------------------------------------------------
Hi, I have an example for the unique combinations of n choose 3.
Perhaps it might help you in solving the n choose r problem?

> x <- 1:9
> y <- cbind(rep(x, length(x)^2), rep(rep(x, each=length(x)), length(x)), 
       rep(x, each=length(x)^2))
> y2 <- y[c(y[,1] != y[,2] & y[,2]!=y[,3] & y[,1]!=y[,3] & y[,1] < y[,2] 
          & c(y[, 2] < y[, 3])), ]
> dim(y2)

(from Yuelin Li)
-----------------------------------------------------------------
An easy solution is to write a funtion like this found in VR (*) 2nd ed.
p. 149 or in the S-news archives:

subsets <- function(n, k, set = 1:n)
    if(k <= 0) NULL else
      if(k >= n) set else
        rbind(cbind(set[1], Recall(n - 1, k - 1, set[-1])),
        Recall(n - 1, k, set[-1]))
 
then:

> subsets(4, 2)
     [,1] [,2] 
[1,]    1    2
[2,]    1    3
[3,]    1    4
[4,]    2    3
[5,]    2    4
[6,]    3    4


Hope this helps,

Renaud

(*) Venables, W. N. and B. D. Ripley (1998). Modern applied statistics
with S-Plus. New York (USA), Springer-Verlag.

(from Renaud Lancelot)
---------------------------------------------------------------
I had the same dilemma myself with loops 
and generating permutations. Below are 2 emails, one from 
Professor Ripley and the other from Dr. Venables. They 
should help you decide whether S-plus will do the job ( i.e 
n,r aren't too large ) or whether like me you have to move 
to a C routine called from within S-plus. See Chapter 16 of 
the S-Plus 2000 Programmer's Guide for details on creating 
the DLL file. Credit should be given to messrs Ripley and 
Venables if this works for you ( and the S-Plus routine 
from Dr. Venables is the fastest I know of for an S-plus 
solution)

                                         Hope this helps,  
Ian Clough                                                 
cloughi@email.uc.edu 

Dr. Venables

                 Doing things in a loop can be rather 
costly in S, too, but if you need to generate all possible 
subsets sequentially here is *one*
way to do it.

(Code below; figuring out how it works, and improving it, 
remains an exercise for the casual onlooker.)

It returns the next subset at each call, and NULL when the
sequence is finished.  Here is an example of how you might 
use it:

> while(!is.null(m <- next.subset(5, 3))) 
  {
#  `m' holds the index of the subset
        print(m)
  }
[1] 1 2 3
[1] 1 2 4
[1] 1 2 5
[1] 1 3 4
[1] 1 3 5
[1] 1 4 5
[1] 2 3 4
[1] 2 3 5
[1] 2 4 5
[1] 3 4 5
> 

(Instead of "print(m)" you may wish to do something more
interesting...)

---(cut here)---
next.subset <- function(n, r) {
  if(exists(NR <- paste(n, r), frame = 0)) {
    nr <- get(NR, frame = 0)
    if(any(w <- (nr < (n-r+1):n))) {
      m <- max((1:r)[w])
      k <- m:r
      nr[k] <- k + nr[m] - m + 1
    }
    else {
      remove(NR, frame = 0)
      return(NULL)
    }
  }
  else
    nr <- 1:r
  assign(NR, nr, frame = 0)
  nr
}

cancel <- function(n, r)
  invisible(if(exists(nr <- paste(n, r), frame = 0))
            remove(nr, frame = 0))
---(cut here)---

If you wish to re-start the sequence you should use the 
function

> cancel(5, 3)

to go back to the beginning.

The function is just a way of presenting an algorithm.  If 
you really are committed to using a very long loop in S 
code, you probably should build in the next.subset 
computation as part of the loop rather than calling a 
function every time, but there may not be very much in it, 
especially if each step is going to be a pretty large 
computation, anyway.  This is the sort of territory where 
dynamically loaded code starts to become unavoidable.


Professor Ripley

...there is C code to do precisely that in the lqs function 
in the V&R3 MASS 
library, as in

/*
   Find all subsets of size k in order: this gets a new one 
each call */ void next_set(Sint * x, int n, int k) {    int 
i, j, tmp;

    j = k - 1;
    tmp = x[j]++;
    while (j > 0 && x[j] >= n - (k - 1 - j)) tmp = 
++x[--j];     for (i = j + 1; i < k; i++) x[i] = ++tmp;
}
and you will need to look at the code to see how it is 
initialized (x is 1:k).  (This is often called a Gray code 
algorithm.)

And although I originally wrote this as a loop in S, doing 
100000 subsets meant that I needed to do it all in C.

(from Ian Clough)
-----------------------------------------------------------
Helen -
You seem to be asking for a special case of the obtaining
of all nCm combinations.  I cannot now remember where I found
the following, probably on statlib:

> combn
function(x, m, fun = NULL, simplify = T, ...)
{
#       DATE WRITTEN: 14 April 1994          LAST REVISED:  10 July 1995
#       AUTHOR:  Scott Chasalow
#
#       DESCRIPTION:
#             Generate all combinations of the elements of x taken m at a
time. 
#             If x is a positive integer,  returns all combinations
#             of the elements of seq(x) taken m at a time.
#             If argument "fun" is not null,  applies a function given
#             by the argument to each point.  If simplify is FALSE,  returns 
#             a list; else returns a vector or an array.  "..." are passed 
#             unchanged to function given by argument fun,  if any.
#       REFERENCE:
#             Nijenhuis, A. and Wilf, H.S. (1978) Combinatorial Algorithms
for 
#             Computers and Calculators.  NY:  Academic Press.
#       EXAMPLES:
#             > combn(letters[1:4], 2)
#             > combn(10, 5, min)  # minimum value in each combination
#             Different way of encoding points:
#             > combn(c(1,1,1,1,2,2,2,3,3,4), 3, tabulate, nbins = 4)
#             Compute support points and (scaled) probabilities for a
#             Multivariate-Hypergeometric(n = 3, N = c(4,3,2,1)) p.f.:
#             > table.mat(t(combn(c(1,1,1,1,2,2,2,3,3,4), 3,
tabulate,nbins=4)))
#
        if(length(m) > 1) {
                warning(paste("Argument m has", length(m), "elements: only the 
first used")
                        )
                m <- m[1]
        }
        if(m < 0)
                stop("m < 0")
        if(m == 0)
                return(if(simplify) vector(mode(x), 0) else list())
        if(is.numeric(x) && length(x) == 1 && x > 0 && trunc(x) == x)
                x <- seq(x)
        n <- length(x)
        if(n < m)
                stop("n < m")
        e <- 0
        h <- m
        a <- 1:m
        nofun <- is.null(fun)
        count <- nCm(n, m, 0.1)
        out <- vector("list", count)
        out[[1]] <- if(nofun) x[a] else fun(x[a], ...)
        if(simplify) {
                dim.use <- NULL
                if(nofun) {
                        if(count > 1)
                                dim.use <- c(m, count)
                }
                else {
                        out1 <- out[[1]]
                        d <- dim(out1)
                        if(count > 1) {
                                if(length(d) > 1)
                                  dim.use <- c(d, count)
                                else if(length(out1) > 1)
                                  dim.use <- c(length(out1), count)
                        }
                        else if(length(d) > 1)
                                dim.use <- d
                }
        }
        i <- 2
        nmmp1 <- n - m + 1
        mp1 <- m + 1
        while(a[1] != nmmp1) {
                if(e < n - h) {
                        h <- 1
                        e <- a[m]
                        j <- 1
                }
                else {
                        h <- h + 1
                        e <- a[mp1 - h]
                        j <- 1:h
                }
                a[m - h + j] <- e + j
                out[[i]] <- if(nofun) x[a] else fun(x[a], ...)
                i <- i + 1
        }
        if(simplify) {
                if(is.null(dim.use))
                        out <- unlist(out)
                else out <- array(unlist(out), dim.use)
        }
        out
}

(from John Maindonald)
-------------------------------------------------------------

Dr. Helen Clough,
Postdoctoral Research Assistant,
Department of Veterinary Clinical Sciences,
Leahurst,
Neston,
South Wirral 
CH64 7TE

e-mail: wilsonh@liverpool.ac.uk



-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>
  • [S] Re. Query: combinations of groups, Helen Clough <=