s-news
[Top] [All Lists]

recode

To: <s-news@lists.biostat.wustl.edu>
Subject: recode
From: "Frank Lawrence" <Cougar@psu.edu>
Date: Sun, 23 Mar 2003 17:15:47 -0500
Importance: Normal
Organization: PSU
Reply-to: <Cougar@psu.edu>
I thank Andy Liaw, Chuck Cleland, Bert Gunter, Nick Ellis, Charles Wright,
Patrick Burns, Spencer Graves, James Holtman, Frédéric Gosselin, Andrew
Robinson, and Tamara Shatar for their very helpful suggestions.  I have
listed their responses below as well as my original question.


From: Tamara M. Shatar

Hi,

You might want to try using ifelse statements.

You can fulfill multiple criteria by using them within each other, e.g.

ifelse(mymat<0.5,0,(ifelse(mymat>=1.5,2,1)))

I hope this helps,

Tamara.

From: Nick Ellis

ifelse() is the most direct way, but cut() might be easier, especially for
the case of general break points. However cut uses intervals that include
the right-hand point and exclude the left ("0.5+ thru 1.5"), whereas you
want the opposite. One trick is to apply cut to the negative of the data and
reverse the order of the levels. 

> x <- seq(0,2,0.1)
> x
 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7
1.8 1.9 2.0
> ifelse(x<0.5,0,ifelse(x<1.5,1,2))
 [1] 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
> cut(x,breaks=c(-Inf,0.5,1.5,Inf),factor=T)
 [1] -Inf+ thru 0.5 -Inf+ thru 0.5 -Inf+ thru 0.5 -Inf+ thru 0.5 -Inf+ thru
0.5 -Inf+ thru 0.5  0.5+ thru 1.5  0.5+ thru 1.5  [9]  0.5+ thru 1.5  0.5+
thru 1.5  0.5+ thru 1.5  0.5+ thru 1.5  0.5+ thru 1.5  0.5+ thru 1.5  0.5+
thru 1.5  0.5+ thru 1.5 [17]  1.5+ thru Inf  1.5+ thru Inf  1.5+ thru Inf
1.5+ thru Inf  1.5+ thru Inf
> c(0,1,2)[cut(x,breaks=c(-Inf,0.5,1.5,Inf),factor=T)]  # using a factor as
an index
 [1] 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2                  # not quite
what you want
> rev(c(0,1,2))[cut(-x,breaks=-rev(c(-Inf,0.5,1.5,Inf)),factor=T)]
 [1] 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2                  # this is
what you want


Nick Ellis
CSIRO Marine Research   mailto:Nick.Ellis@csiro.au
PO Box 120                      ph    +61 (07) 3826 7260
Cleveland QLD 4163      fax   +61 (07) 3826 7222
Australia                       http://www.marine.csiro.au

From: Charles (Ted) Wright

If df is your data frame containing solely numeric quantities to be coded,
then the following should work

df[df < .5] <- 0
df[.5 <= df & df <= 1.5] <- 1
df[df > 1.5] <- 2

Ted Wright

From: Frédéric Gosselin 

I woudl say commands like: 

x[x<0.5]_0 

(for the first condition) 

should work. 

Sincerely, 

Frédéric Gosselin 
Researcher (PhD) & Engineer in Forest Ecology 
Cemagref 
Domaine des Barres 
45 290 Nogent-sur-Vernisson 
FRANCE 

Tel: 33-2-38-95-03-58 
Fax: 33-2-38-95-03-44 

-----Original Message-----
From: Patrick Burns [mailto:pburns@pburns.seanet.com] 
Sent: Thursday, March 20, 2003 13:13
To: Cougar@psu.edu
Subject: Re: [S] recode


You can do things like:

x[x < 0.5] <- 0

(I'm not sure that will work on a whole data frame in one go though, it will
on  a matrix.)

S Poetry might give you some more ideas.

Good luck,

Patrick Burns

Burns Statistics
patrick@burns-stat.com
+44 (0) 208 525 0696
http://www.burns-stat.com/    (new home of S Poetry)

-----Original Message-----
From: james.holtman@convergys.com [mailto:james.holtman@convergys.com]
Sent: Thursday, March 20, 2003 12:45
To: Cougar@psu.edu
Subject: Re: [S] recode



If the dataframe only contains numeric, then make a matrix and do the
following:

x.1 <- as.matrix(dataframe)
x.1 <ifelse(x.1<.5, 0, ifelse(x.1>= 1.5, 2, 1))

if there are only certain columns that you want to do it on, then iterate on
those columns with the above statement.

From Andy Liaw:

Coerce the data to a vector, use cut() to categorize it, use codes() to turn
the result into 1, 2, etc., subtract 1 from it to have it starts at 0.  Add
dim attribute back, and if needed, coerce to data.frame.  E.g.,

> x <- as.data.frame(matrix(runif(600*200), 600, 200)*2)
> xc <- codes(cut(as.matrix(x), c(-Inf, 0.5, 1.5, Inf))-1)
> table(xc)
     0     1     2 
 29854 60021 30125
> dim(xc) <- dim(x)
> xc <- as.data.frame(xc)


Andy

> -----Original Message-----
> From: s-news-owner@lists.biostat.wustl.edu
> [mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Spencer 
> Graves
> Sent: Thursday, March 20, 2003 12:51
> To: Cougar@psu.edu
> Cc: s-news@lists.biostat.wustl.edu
> Subject: Re: [S] recode
> 
> 
> How about the following:
> 
> tst.data <- data.frame(a=seq(0, 2, length=9), b=seq(0, 2, length=9))
> round(tst.data)
> 
> Alternatively:
> 
> (tst.data>=.5)+(tst.data>1.5)
> 
> Do these do work for you?
> Best Wishes,
> Spencer Graves

From: Chuck Cleland


  Here is one untested idea using apply() and cut().

mydata <- matrix(rnorm(600*200), ncol=200)

apply(mydata, 2, function(x){cut(x, c(min(x), 0.5, 1.5, max(x)))}) - 1

I don't think my example gets it.  For example,

 > cut(1:10, c(1, 3, 8, 10))
  [1] NA  1  1  2  2  2  2  2  3  3

So you might need something like:

cut(x, c(min(x) - 1, 0.49, 1.5, max(x))

For the particular problem you mentioned, I thought Spencer 
Graves' second solution solution was quite nice.

regards,

Chuck

Original Question:
> I would like to know how to quickly recode multiple variables.  In one 
> problem I have, I need to recode the data in a 600 x 200 data frame so 
> that values less than 0.5 are scored 0, those equal to or greater than 
> 1.5 are scored 2, and those in between are scored 1.  Using loops 
> seems to take a very long time.  Is there a more efficient way to 
> accomplish the recode? WIN 2k, Splus 6.1

Respectfully, 
Frank R. Lawrence, Ph.D.


<Prev in Thread] Current Thread [Next in Thread>