s-news
[Top] [All Lists]

detecting repeated values in a vector (summary of solutions)

To: s-news@lists.biostat.wustl.edu
Subject: detecting repeated values in a vector (summary of solutions)
From: Caitlin Burgess <caitlin@cbr.washington.edu>
Date: Fri, 5 May 2006 11:07:04 -0700 (PDT)
Cc: calder@phz.com, bill@insightful.com, pburns@pburns.seanet.com, timh@insightful.com, chuck@insightful.com, ssu@thegeorgeinstitute.org, dmck@u.washington.edu, mstuder@insightful.com
Thank you very much to all of you who replied to my question. From
the replies I learned about the functions rle() and duplicated(). Thanks
very much to those who included code to solve the problems of detecting
long runs: finding their length, finding the values associated with them, 
and finding the positions at which they start. I'm copying in two of these 
examples, using rle().  

EXAMPLE #1 USING rle(): FINDS LONG SEQUENCES AND THE VALUES ASSOCIATED
WITH THEM:
# generate some data
k <- 6
test <- c(1, 7, 3, 8, 8, 8, 2, 2, 2, 1, 1, 2, 1, 4, 4, 4, rep(5, k), 9, 9,
rep(7, k + 4), 8 ,8)

# run length encoding
rle.test <- rle(test)
# index of sequences longer or equal to k
idx <- rle.test$length >= k
# get corresponding values
rle.test$values[idx]


EXAMPLE #2 USING rle(): FINDS POSIITIONS AT WHICH LONG SEQUENCES BEGIN
>longRunsAt <- function(x, k) {
        # where are runs in x of length >= k?
        z <- rle(x)
        i <- z$lengths >= k
        cumsum(c(1, z$lengths[ - length(z$lengths)]))[i]
  }
>longRunsAt(c(1,2,3,3,3,3,4,4,1,1,1,1,2,2), k=4)
[1] 3 9




<Prev in Thread] Current Thread [Next in Thread>
  • detecting repeated values in a vector (summary of solutions), Caitlin Burgess <=