s-news
[Top] [All Lists]

Re: subscripting on multiple columns

To: "Thompson, David (MNR)" <David.John.Thompson@ontario.ca>
Subject: Re: subscripting on multiple columns
From: Chuck Cleland <ccleland@optonline.net>
Date: Mon, 12 Feb 2007 13:47:15 -0500
Cc: s-news <s-news@lists.biostat.wustl.edu>
In-reply-to: <ECF21B71808ECF4F8918C57EDBEE121D8DD256@CTSPITDCEMMVX11.cihs.ad.gov.on.ca>
References: <ECF21B71808ECF4F8918C57EDBEE121D8DD256@CTSPITDCEMMVX11.cihs.ad.gov.on.ca>
User-agent: Thunderbird 1.5.0.9 (Windows/20061207)
Thompson, David (MNR) wrote:
> Hello,
> 
> Say I have several (3+) factors with many (100+) responses and I would
> like to pull the subset where *any* of the responses fit an arbitrary
> condition.
> What would be the proper way to subscript the data.frame in the most
> concise, generic fashion?
> 
> For example:
>> set.seed(48)
>> junk <- cbind(fac.design(c(2,2,2),c('a','b','c')),
> r1=round(runif(8,0,10),0), r2=round(runif(8,0,10),0),
> r3=round(runif(8,0,10),0), r4=round(runif(8,0,10),0))
>> junk
>    a  b  c r1 r2 r3 r4 
> 1 a1 b1 c1  2 10  4  7
> 2 a2 b1 c1  8  8  3  6
> 3 a1 b2 c1  3  0 10  9
> 4 a2 b2 c1 10  0  9  1
> 5 a1 b1 c2  6  4  2  6
> 6 a2 b1 c2  1  3  9  8
> 7 a1 b2 c2  9  4  6  1
> 8 a2 b2 c2  8  8  6 10
> 
> I can extract the desired subset by:
>> junk[junk$r1<2 | junk$r2<2 | junk$r3<2 | junk$r4<2,]
>    a  b  c r1 r2 r3 r4 
> 3 a1 b2 c1  3  0 10  9
> 4 a2 b2 c1 10  0  9  1
> 6 a2 b1 c2  1  3  9  8
> 7 a1 b2 c2  9  4  6  1
> 
> But, what if there are many (100+) such columns to query?
> I was hoping to be able to something like:
>> junk[junk[,3:7]<2,]
>      a  b  c r1 r2 r3 r4 
>  NA NA NA NA NA NA NA NA
> NA1 NA NA NA NA NA NA NA
> NA2 NA NA NA NA NA NA NA
> NA3 NA NA NA NA NA NA NA
> NA4 NA NA NA NA NA NA NA
> NA5 NA NA NA NA NA NA NA
> NA6 NA NA NA NA NA NA NA
> NA7 NA NA NA NA NA NA NA
>   X NA NA NA NA NA NA NA
>  X8 NA NA NA NA NA NA NA
>  X9 NA NA NA NA NA NA NA
> X10 NA NA NA NA NA NA NA

  I believe your approach failed both because column 3 is a factor and
because junk[,4:7] < 2 returns four logical values for each row rather
than one.  How about this?

junk[rowSums(junk[,grep("^r", names(junk))] < 2) > 0,]

   a  b  c r1 r2 r3 r4
3 a1 b2 c1  3  0 10  9
4 a2 b2 c1 10  0  9  1
6 a2 b1 c2  1  3  9  8
7 a1 b2 c2  9  4  6  1

  or similarly

junk[apply(junk[,4:7] < 2, 1, any),]

   a  b  c r1 r2 r3 r4
3 a1 b2 c1  3  0 10  9
4 a2 b2 c1 10  0  9  1
6 a2 b1 c2  1  3  9  8
7 a1 b2 c2  9  4  6  1

hope it helps,

Chuck Cleland

> Obviously not correct.
> 
> All comments are appreciated, Thanks, DaveT.
> *************************************
> Silviculture Data Analyst
> Ontario Forest Research Institute
> Ontario Ministry of Natural Resources
> david.john.thompson@ontario.ca
> http://ofri.mnr.gov.on.ca
> *************************************
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

<Prev in Thread] Current Thread [Next in Thread>