I don't really understand your approach to it, but if I understand your
question correctly, I think you can simply do the following:
> vec -> anc.all.no.doublons$nosoum
> ttt.ind -> seq(length(vec))[vec%in%vec[duplicated(vec)]]
This took about 1 second on my linux box with a vector length
of about 200K, most of which were duplicates.
Gerald.Jean@spgdag.ca wrote:
>
> Hello S-users,
>
> S-2000, R3; NT4.0, SP5
>
> I have a large data.frame and I want to find the indices of duplicate
> entries in one of the column. Through the function duplicated I can find
> the indices of entries which have appeared before, but what I am interested
> in is the indices of all entries appearing more than once. I have been
> implementing this through the apply function but it takes for ever to run.
> In my application here I have roughly 180K observations and duplicated
> tells me that roughly 30K have appeared before, hence I am after 60K or
> more indices. Here is my, very slow way, of doing it:
>
> ttt.duplicated <- duplicated(anc.all.no.doublons[, 'nosoum'])
> ttt.paste.all <- anc.all[, 'nosoum']
> ttt.paste <- anc.all[, 'nosoum'][ttt.duplicated]
> ttt.ind <- apply(as.matrix(ttt.paste), 1, FUN = function(x, y) which(y
> == x),
> ttt.paste.all)
> ttt.ind <- unlist(ttt.ind)
> ttt.ind <- sort(ttt.ind)
>
> Any hint on how to speed that up?
>
> Thanks,
>
> Gérald Jean
> Analyste-conseil (statistiques), Actuariat
> télephone : (418) 835-4900 poste (7639)
> télecopieur : (418) 835-6657
> courrier électronique: gerald.jean@spgdag.ca
>
> "In God we trust all others must bring data" W. Edwards Deming
>
> ---------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu. To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message: unsubscribe s-news
--
-----------------------------------------------------------------
Pierre Kleiber Email: pkleiber@honlab.nmfs.hawaii.edu
Fishery Biologist Tel: 808 983-5399/737-7544
NOAA FISHERIES - Honolulu Laboratory Fax: 808 983-2902
2570 Dole St., Honolulu, HI 96822-2396
-----------------------------------------------------------------
"God could have told Moses about galaxies and mitochondria and
all. But behold... It was good enough for government work."
-----------------------------------------------------------------
|