s-news
[Top] [All Lists]

Summary (RE: select rows which have at least a given number of duplicate

To: <s-news@lists.biostat.wustl.edu>
Subject: Summary (RE: select rows which have at least a given number of duplicates)
From: "vincent vinh-hung" <conrvhgv@az.vub.ac.be>
Date: Sun, 8 Oct 2006 12:38:54 +0200
In-reply-to: <OF4761BAA4.578BB4C6-ON862571F8.00579FAE-862571F8.0058158B@usgs.gov>
Thread-index: Acbj4MTE//vTYP3KQEqS4h1DDTq5VAG5CYFw
Many thanks to David Lorenz who suggested do.call
Bill Dunlap who suggested table() more general than tabulate.
David Pollard's web site also provided an example
adaptable as:
m$Y[match(m$X,names(table(m$X)[table(m$X)>2]),nomatch=0)>0]
With thanks,
Vincent

-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu 
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of David L Lorenz
Sent: Friday, September 29, 2006 6:02 PM
To: vincent vinh-hung
Cc: s-news@lists.biostat.wustl.edu; s-news-owner@lists.biostat.wustl.edu
Subject: Re: [S] select rows which have at least a given number of duplicates


Vincent, 
  The tabulate() function will convert anything to factors and process those 
data correctly. It is the subsetting [m$X] that must be
integer. 
  You can use the by() function and do.call("rbind", ) to do what you want: 

do.call("rbind", by(m, m$X, function(x) if(nrow(x) > 2) x else NULL)) 

  Take a look at the documentation for the functions to see what they do. 
Dave 




"vincent vinh-hung" <conrvhgv@az.vub.ac.be> 
Sent by: s-news-owner@lists.biostat.wustl.edu 

09/29/2006 08:23 AM To
<s-news@lists.biostat.wustl.edu> 
cc
Subject
[S] select rows which have at least a given number of duplicates

        




I would like to select rows from a table that 
have at least, say, 3 duplicates.

> m <- as.data.frame(c(1,1,4,3,2,3,1))
> m$Y <-c("a","b","c","d","e","f","g")
> names(m) <- c("X","Y")
> m
 X Y 
1 1 a
2 1 b
3 4 c
4 3 d
5 2 e
6 3 f
7 1 g

Command tabulate seems to do the job:
> m[tabulate(m$X)[m$X]>2,]
 X Y 
1 1 a
2 1 b
7 1 g

But tabulate is limited to integers.
Are there other ways that could be applied to reals
or to text?

Thanks in advance for any suggestion.

Vincent Vinh-Hung


--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news





<Prev in Thread] Current Thread [Next in Thread>
  • Summary (RE: select rows which have at least a given number of duplicates), vincent vinh-hung <=