********************************************************************
John Fox
<jfox@mcmaster.ca> To: "Patricia
Farra" <patricia.farra@rogers.com>
Sent by: cc:
s-news@lists.biostat.wustl.edu
s-news-owner@lists.biosta Subject: Re: [S]
Categorization and use of if
t.wustl.edu
11-06-2002 00:53
Dear Patricia,
At 05:16 PM 6/10/2002 -0400, Patricia Farra wrote:
>1) var1<-c(5.1,4.9,4.7,4.6,5.0,5.4,4.6,5.0,4.4,4.9,5.4,4.8,4.8,
>4.3,5.8, 5.7,5.4,5.1,5.7,5.1,5.4,5.1,4.6,5.1,4.8,5.0,5.0,5.2,5.2,4.7,
>4.8,5.4,5.2,5.5,4.9,5.0,5.5,4.9,4.4,5.1,5.0,4.5,4.4,5.0,5.1,4.8,5.1,
>4.6,5.3,5.0,7.0,6.4,6.9,5.5,6.5,5.7,6.3,4.9,6.6,5.2,5.0,5.9,6.0,
>6.1,5.6,6.7,5.6,5.8,6.2,5.6,5.9,6.1,6.3,6.1,6.4,6.6,6.8,6.7,6.0,
>5.7,5.5,5.5,5.8,6.0,5.4,6.0,6.7,6.3,5.6,5.5,5.5,6.1,5.8,5.0,5.6,
>5.7,5.7,6.2,5.1,5.7,6.3,5.8,7.1,6.3,6.5,7.6,4.9,7.3,6.7,7.2,6.5,
>6.4,6.8,5.7,5.8,6.4,6.5,7.7,7.7,6.0,6.9,5.6,7.7,6.3,6.7,7.2,6.2,
>6.1,6.4,7.2,7.4,7.9,6.4,6.3,6.1,7.7,6.3,6.4,6.0,6.9,6.7,6.9,5.8,
>6.8,6.7,6.7,6.3,6.5,6.2,5.9)
>
>I would like to split var1 into:
> a) equal-width-bins
> b) 7 bins
>
If you want seven equal-width bins, cut(var1, 7) will do the trick,
producing a "category" object as a result; as.numeric(cut(var1, 7))
produces a simple numeric vector.
>2) I manually categorize var1 into 7 categories and get
>x1<-c(3,2,2,2,3,3,2,3,2,2,3,2,2,1,4,4,3,3,4,3,3,3,2,3,2,3,3,3,
>3,2,2,3,3,3,2,3,3,2,1,3,3,2,1,3,3,2,3,2,3,3,6,5,6,3,5,4,5,2,5,
>3,3,4,4,5,4,6,4,4,5,4,4,5,5,5,5,5,6,6,4,4,3,3,4,4,3,4,6,5,4,3,
>3,5,4,3,4,4,4,5,3,4,5,4,6,5,5,7,2,7,6,7,5,5,6,4,4,5,5,7,7,4,6,
>4,7,5,6,7,5,5,5,7,7,7,5,5,5,7,5,5,5,6,6,6,4,6,6,6,5,5,5,4)
>
Your result is a bit different from that produced by cut.
>I tried this
> > x1<-if(var1==1) 1000000 else
>+ if(var1==2) 0100000 else
>+ if(var1==3) 0010000 else
>+ if(var1==4) 0001000 else
>+ if(var1==5) 0000100 else
>+ if(var1==6) 0000010 else
>+ 0000001
>
>and got
>
> > x1
>[1] 10000
>
>Expected result: 0010000,0100000,0100000,0100000,etc...
>
>There are a few problems here. First, "if" is not vectorized; you can use
>"ifelse" instead. Second, since the result is numeric, leading zeroes will
>not appear. If you need them, you could use character values for the
>result. Third, the categorized version of your variable is x1, not var1:
> x1 <- as.numeric(cut(var1, 7))
> x1 <- ifelse(x1==1, '1000000',
> ifelse(x1==2, '0100000',
> ifelse(x1==3, '0010000',
> ifelse(x1==4, '0001000',
> ifelse(x1==5, '0000100',
> ifelse(x1==6, '0000010', '0000001'))))))
>I hope that this helps,
> John
Another solution is to use factors eg
x1 <- factor(x1,
levels=c("1","2","3","4","5","6","7"),
labels=c('1000000','0100000','0010000',
'0001000','0000100','0000010','0000001'))
x1 <- as.character(x1)
this method is not more timesaving than the ifelse solution, but for
complecated replacement problemts it is very convienient to use the
suggested method.
Chears
Sven
|