Thanks Sven. This adds one more technique to my learning. Now, my understing
is that cut(x,br) is based upon assumption of normal distribution. Is there
any way make use another distribution? I'm afraid I'm wrong.
Thanks
Pat.
----- Original Message -----
From: <Sven.Knudsen@adeptscience.dk>
To: "Patricia Farra" <patricia.farra@rogers.com>; "John Fox"
<jfox@mcmaster.ca>
Cc: <s-news@lists.biostat.wustl.edu>
Sent: Wednesday, June 12, 2002 10:26 AM
Subject: Re: [S] Categorization and use of if
>
> ********************************************************************
>
>
>
>
> John Fox
> <jfox@mcmaster.ca> To: "Patricia
Farra" <patricia.farra@rogers.com>
> Sent by: cc:
s-news@lists.biostat.wustl.edu
> s-news-owner@lists.biosta Subject: Re: [S]
Categorization and use of if
> t.wustl.edu
>
>
> 11-06-2002 00:53
>
>
>
>
>
>
> Dear Patricia,
>
> At 05:16 PM 6/10/2002 -0400, Patricia Farra wrote:
>
> >1) var1<-c(5.1,4.9,4.7,4.6,5.0,5.4,4.6,5.0,4.4,4.9,5.4,4.8,4.8,
> >4.3,5.8, 5.7,5.4,5.1,5.7,5.1,5.4,5.1,4.6,5.1,4.8,5.0,5.0,5.2,5.2,4.7,
> >4.8,5.4,5.2,5.5,4.9,5.0,5.5,4.9,4.4,5.1,5.0,4.5,4.4,5.0,5.1,4.8,5.1,
> >4.6,5.3,5.0,7.0,6.4,6.9,5.5,6.5,5.7,6.3,4.9,6.6,5.2,5.0,5.9,6.0,
> >6.1,5.6,6.7,5.6,5.8,6.2,5.6,5.9,6.1,6.3,6.1,6.4,6.6,6.8,6.7,6.0,
> >5.7,5.5,5.5,5.8,6.0,5.4,6.0,6.7,6.3,5.6,5.5,5.5,6.1,5.8,5.0,5.6,
> >5.7,5.7,6.2,5.1,5.7,6.3,5.8,7.1,6.3,6.5,7.6,4.9,7.3,6.7,7.2,6.5,
> >6.4,6.8,5.7,5.8,6.4,6.5,7.7,7.7,6.0,6.9,5.6,7.7,6.3,6.7,7.2,6.2,
> >6.1,6.4,7.2,7.4,7.9,6.4,6.3,6.1,7.7,6.3,6.4,6.0,6.9,6.7,6.9,5.8,
> >6.8,6.7,6.7,6.3,6.5,6.2,5.9)
> >
> >I would like to split var1 into:
> > a) equal-width-bins
> > b) 7 bins
> >
>
> If you want seven equal-width bins, cut(var1, 7) will do the trick,
> producing a "category" object as a result; as.numeric(cut(var1, 7))
> produces a simple numeric vector.
>
> >2) I manually categorize var1 into 7 categories and get
> >x1<-c(3,2,2,2,3,3,2,3,2,2,3,2,2,1,4,4,3,3,4,3,3,3,2,3,2,3,3,3,
> >3,2,2,3,3,3,2,3,3,2,1,3,3,2,1,3,3,2,3,2,3,3,6,5,6,3,5,4,5,2,5,
> >3,3,4,4,5,4,6,4,4,5,4,4,5,5,5,5,5,6,6,4,4,3,3,4,4,3,4,6,5,4,3,
> >3,5,4,3,4,4,4,5,3,4,5,4,6,5,5,7,2,7,6,7,5,5,6,4,4,5,5,7,7,4,6,
> >4,7,5,6,7,5,5,5,7,7,7,5,5,5,7,5,5,5,6,6,6,4,6,6,6,5,5,5,4)
> >
>
> Your result is a bit different from that produced by cut.
>
> >I tried this
> > > x1<-if(var1==1) 1000000 else
> >+ if(var1==2) 0100000 else
> >+ if(var1==3) 0010000 else
> >+ if(var1==4) 0001000 else
> >+ if(var1==5) 0000100 else
> >+ if(var1==6) 0000010 else
> >+ 0000001
> >
> >and got
> >
> > > x1
> >[1] 10000
> >
> >Expected result: 0010000,0100000,0100000,0100000,etc...
> >
>
> >There are a few problems here. First, "if" is not vectorized; you can use
> >"ifelse" instead. Second, since the result is numeric, leading zeroes
will
>
> >not appear. If you need them, you could use character values for the
> >result. Third, the categorized version of your variable is x1, not var1:
>
> > x1 <- as.numeric(cut(var1, 7))
>
> > x1 <- ifelse(x1==1, '1000000',
> > ifelse(x1==2, '0100000',
> > ifelse(x1==3, '0010000',
> > ifelse(x1==4, '0001000',
> > ifelse(x1==5, '0000100',
> > ifelse(x1==6, '0000010', '0000001'))))))
> >I hope that this helps,
> > John
>
>
> Another solution is to use factors eg
> x1 <- factor(x1,
> levels=c("1","2","3","4","5","6","7"),
> labels=c('1000000','0100000','0010000',
> '0001000','0000100','0000010','0000001'))
>
> x1 <- as.character(x1)
>
> this method is not more timesaving than the ifelse solution, but for
> complecated replacement problemts it is very convienient to use the
> suggested method.
>
> Chears
> Sven
>
>
>
>
>
|