s-news
[Top] [All Lists]

Re: splitting data

To: "Arjun Bhandari" <arb.eu@adia.ae>
Subject: Re: splitting data
From: Frank E Harrell Jr <feh3k@spamcop.net>
Date: Wed, 17 Dec 2003 18:51:02 -0500
Cc: arb.eu@adia.co.ae, s-news@lists.biostat.wustl.edu
In-reply-to: <OF4FF907B0.98708AAA-ON44256C8C.00359B4E-44256DFE.00329A6E@adiaweb.adia.co.ae>
Organization: Vanderbilt University
References: <OF89449892.7685CBA2-ON44256C8C.00310EE7@adiaweb.adia.co.ae> <OF4FF907B0.98708AAA-ON44256C8C.00359B4E-44256DFE.00329A6E@adiaweb.adia.co.ae>
Also look at the cut2 function in the Hmisc library.  -FH

On Tue, 16 Dec 2003 13:12:43 +0400
"Arjun Bhandari" <arb.eu@adia.ae> wrote:

> 
> Arjun,
> 
> How about the following self-made code:
> 
> divide<-function(n,k){
> 
>  #This function is to divide the whole sample into several
>         #subgroups such that the sizes of them are as close
>         #to each other as possible.
> 
>  #n is sample size
>  #k is the number of subgroups you specify.
>  a<-n%%k
>  b<-n%/%k
>  x<-rep(b,k)
>  if (a==0) return(x)
> 
>  x[1:a]<-x[1:a]+1
>  #If you would like to randomly assign the extra ones to the
>         #categories, you can do this:
>  #chosen<-sample(1:k,a,replace=F)
>  #x[chosen]<-x[chosen]+1
>  return(x)
> }
> 
> then use the vector generated from this function to subset.
> For instance,
> 
> Sizes<-divide(n,k)
> CumSizes<-c(0,cumsum(Sizes))
> for(i in 1:k)
> assign(paste("junk",i,sep=""),x[(CumSizes[i]+1):CumSizes[i+1],])
> 
> Note: you can still do the subsetting randomly in the same way
> as in the function. However, the random design here will render
> the randomness within the above function unnecessary, right?
> 
> Good luck,
> jingshan
> 
> 
> 
> On Wed, 11 Dec 2002, Arjun Bhandari wrote:
> 
> > Hi,
> >
> > Please ignore my earlier mail. I want to split the data into five
> > apporoximately equal parts. Is there a function in splus which allows
> > me
> to
> > do that. I want to do this such that the number of data points in each
> > category is similar, i.e. maximum difference of 1.
> >
> > Best Regards
> > Arjun Bhandari
> >
> >
> >
> >
> >
> >
> ***********************************************************************
> *********
> 
> > The contents of this mail are personal opinions of the Author.
> > ADIA disclaims all responsibility and accepts no liability,
> > whatsoever.
> >
> ***********************************************************************
> *********
> 
> > --------------------------------------------------------------------
> > This message was distributed by s-news@lists.biostat.wustl.edu.  To
> > unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> > the BODY of the message:  unsubscribe s-news
> >
> 
> 
> 
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news


---
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

<Prev in Thread] Current Thread [Next in Thread>