I thank Andy Liaw, Chuck Cleland, Bert Gunter, Nick Ellis, Charles Wright,
Patrick Burns, Spencer Graves, James Holtman, Frédéric Gosselin, Andrew
Robinson, and Tamara Shatar for their very helpful suggestions. I have
listed their responses below as well as my original question.
From: Tamara M. Shatar
Hi,
You might want to try using ifelse statements.
You can fulfill multiple criteria by using them within each other, e.g.
ifelse(mymat<0.5,0,(ifelse(mymat>=1.5,2,1)))
I hope this helps,
Tamara.
From: Nick Ellis
ifelse() is the most direct way, but cut() might be easier, especially for
the case of general break points. However cut uses intervals that include
the right-hand point and exclude the left ("0.5+ thru 1.5"), whereas you
want the opposite. One trick is to apply cut to the negative of the data and
reverse the order of the levels.
> x <- seq(0,2,0.1)
> x
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7
1.8 1.9 2.0
> ifelse(x<0.5,0,ifelse(x<1.5,1,2))
[1] 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
> cut(x,breaks=c(-Inf,0.5,1.5,Inf),factor=T)
[1] -Inf+ thru 0.5 -Inf+ thru 0.5 -Inf+ thru 0.5 -Inf+ thru 0.5 -Inf+ thru
0.5 -Inf+ thru 0.5 0.5+ thru 1.5 0.5+ thru 1.5 [9] 0.5+ thru 1.5 0.5+
thru 1.5 0.5+ thru 1.5 0.5+ thru 1.5 0.5+ thru 1.5 0.5+ thru 1.5 0.5+
thru 1.5 0.5+ thru 1.5 [17] 1.5+ thru Inf 1.5+ thru Inf 1.5+ thru Inf
1.5+ thru Inf 1.5+ thru Inf
> c(0,1,2)[cut(x,breaks=c(-Inf,0.5,1.5,Inf),factor=T)] # using a factor as
an index
[1] 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 # not quite
what you want
> rev(c(0,1,2))[cut(-x,breaks=-rev(c(-Inf,0.5,1.5,Inf)),factor=T)]
[1] 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 # this is
what you want
Nick Ellis
CSIRO Marine Research mailto:Nick.Ellis@csiro.au
PO Box 120 ph +61 (07) 3826 7260
Cleveland QLD 4163 fax +61 (07) 3826 7222
Australia http://www.marine.csiro.au
From: Charles (Ted) Wright
If df is your data frame containing solely numeric quantities to be coded,
then the following should work
df[df < .5] <- 0
df[.5 <= df & df <= 1.5] <- 1
df[df > 1.5] <- 2
Ted Wright
From: Frédéric Gosselin
I woudl say commands like:
x[x<0.5]_0
(for the first condition)
should work.
Sincerely,
Frédéric Gosselin
Researcher (PhD) & Engineer in Forest Ecology
Cemagref
Domaine des Barres
45 290 Nogent-sur-Vernisson
FRANCE
Tel: 33-2-38-95-03-58
Fax: 33-2-38-95-03-44
-----Original Message-----
From: Patrick Burns [mailto:pburns@pburns.seanet.com]
Sent: Thursday, March 20, 2003 13:13
To: Cougar@psu.edu
Subject: Re: [S] recode
You can do things like:
x[x < 0.5] <- 0
(I'm not sure that will work on a whole data frame in one go though, it will
on a matrix.)
S Poetry might give you some more ideas.
Good luck,
Patrick Burns
Burns Statistics
patrick@burns-stat.com
+44 (0) 208 525 0696
http://www.burns-stat.com/ (new home of S Poetry)
-----Original Message-----
From: james.holtman@convergys.com [mailto:james.holtman@convergys.com]
Sent: Thursday, March 20, 2003 12:45
To: Cougar@psu.edu
Subject: Re: [S] recode
If the dataframe only contains numeric, then make a matrix and do the
following:
x.1 <- as.matrix(dataframe)
x.1 <ifelse(x.1<.5, 0, ifelse(x.1>= 1.5, 2, 1))
if there are only certain columns that you want to do it on, then iterate on
those columns with the above statement.
From Andy Liaw:
Coerce the data to a vector, use cut() to categorize it, use codes() to turn
the result into 1, 2, etc., subtract 1 from it to have it starts at 0. Add
dim attribute back, and if needed, coerce to data.frame. E.g.,
> x <- as.data.frame(matrix(runif(600*200), 600, 200)*2)
> xc <- codes(cut(as.matrix(x), c(-Inf, 0.5, 1.5, Inf))-1)
> table(xc)
0 1 2
29854 60021 30125
> dim(xc) <- dim(x)
> xc <- as.data.frame(xc)
Andy
> -----Original Message-----
> From: s-news-owner@lists.biostat.wustl.edu
> [mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Spencer
> Graves
> Sent: Thursday, March 20, 2003 12:51
> To: Cougar@psu.edu
> Cc: s-news@lists.biostat.wustl.edu
> Subject: Re: [S] recode
>
>
> How about the following:
>
> tst.data <- data.frame(a=seq(0, 2, length=9), b=seq(0, 2, length=9))
> round(tst.data)
>
> Alternatively:
>
> (tst.data>=.5)+(tst.data>1.5)
>
> Do these do work for you?
> Best Wishes,
> Spencer Graves
From: Chuck Cleland
Here is one untested idea using apply() and cut().
mydata <- matrix(rnorm(600*200), ncol=200)
apply(mydata, 2, function(x){cut(x, c(min(x), 0.5, 1.5, max(x)))}) - 1
I don't think my example gets it. For example,
> cut(1:10, c(1, 3, 8, 10))
[1] NA 1 1 2 2 2 2 2 3 3
So you might need something like:
cut(x, c(min(x) - 1, 0.49, 1.5, max(x))
For the particular problem you mentioned, I thought Spencer
Graves' second solution solution was quite nice.
regards,
Chuck
Original Question:
> I would like to know how to quickly recode multiple variables. In one
> problem I have, I need to recode the data in a 600 x 200 data frame so
> that values less than 0.5 are scored 0, those equal to or greater than
> 1.5 are scored 2, and those in between are scored 1. Using loops
> seems to take a very long time. Is there a more efficient way to
> accomplish the recode? WIN 2k, Splus 6.1
Respectfully,
Frank R. Lawrence, Ph.D.
|