s-news
[Top] [All Lists]

Re: select rows which have at least a given number of duplicates

To: "vincent vinh-hung" <conrvhgv@az.vub.ac.be>
Subject: Re: select rows which have at least a given number of duplicates
From: David L Lorenz <lorenz@usgs.gov>
Date: Fri, 29 Sep 2006 11:02:08 -0500
Cc: s-news@lists.biostat.wustl.edu, s-news-owner@lists.biostat.wustl.edu
In-reply-to: <200609291310.k8TDAmC2010603@pluto.az.vub.ac.be>

Vincent,
  The tabulate() function will convert anything to factors and process those data correctly. It is the subsetting [m$X] that must be integer.
  You can use the by() function and do.call("rbind", ) to do what you want:

do.call("rbind", by(m, m$X, function(x) if(nrow(x) > 2) x else NULL))

  Take a look at the documentation for the functions to see what they do.
Dave



"vincent vinh-hung" <conrvhgv@az.vub.ac.be>
Sent by: s-news-owner@lists.biostat.wustl.edu

09/29/2006 08:23 AM

To
<s-news@lists.biostat.wustl.edu>
cc
Subject
[S] select rows which have at least a given number of duplicates





I would like to select rows from a table that
have at least, say, 3 duplicates.

> m <- as.data.frame(c(1,1,4,3,2,3,1))
> m$Y <-c("a","b","c","d","e","f","g")
> names(m) <- c("X","Y")
> m
 X Y
1 1 a
2 1 b
3 4 c
4 3 d
5 2 e
6 3 f
7 1 g

Command tabulate seems to do the job:
> m[tabulate(m$X)[m$X]>2,]
 X Y
1 1 a
2 1 b
7 1 g

But tabulate is limited to integers.
Are there other ways that could be applied to reals
or to text?

Thanks in advance for any suggestion.

Vincent Vinh-Hung


--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>