s-news
[Top] [All Lists]

Summary of responses to: This must be a simple task...

To: S-News List <s-news@lists.biostat.wustl.edu>
Subject: Summary of responses to: This must be a simple task...
From: "Kim Elmore" <Kim.Elmore@noaa.gov>
Date: Mon, 25 Jul 2005 11:08:20 -0500
In-reply-to: <Pine.SOL.4.32.0507222111480.18308-100000@jaques>
References: <6.2.1.2.2.20050722213007.02c96ed0@129.15.69.4> <Pine.SOL.4.32.0507222111480.18308-100000@jaques>
I'll review my problem using a simple example. Consider two vectors, x and y. The vector y contains all of x, plus additional data. One twist is that both contain duplicate values, while the other twist is that the order of x and y have been permuted so that the elements of x are no longer contiguous within y. I need remove x from y in an element-wise fashion. Here is an example:

x <- c(1, 2, 3, 1.5, 1.5, 8)
y <- c(2, 1.5, 1.5, 1.5, 1.5, 1.5, 2, 3, 9, 11)

The required results consists of the vector:

[1] 1.5, 1.5, 1.5, 2.0, 9.0, 11.0

My thanks to Sam Buttrey, Thomas Jagger, and Bill Dunlap for providing answers.

Thomas Jagger suggested using rle() and packaged it as a function, complete with error/sanity checks:

rem.ainb <- function(a,b)
{
        brl <- rle(sort(b)) #"contains all of a"
        arl <- rle(sort(a))
        ainb <- match(arl$values, brl$values, nomatch=NA)
       if(any(is.na(ainb))) stop("a has at least one value that is not in b")
       brl$lengths[ainb] <- brl$lengths[ainb] - arl$lengths
       if(any(brl$lengths < 0))
stop("There are more duplicates for some value in a than this value in b")
       return(rep(brl$values, brl$lengths))
}

Both Sam and Bill used table() in their solutions. Sam also packaged his solution as a function:

kim <- function(x, y)
{
        # Make table of x and y combined, to preserve levels
        tbl <- table(c(x, y))
        #
        # Reset "tbl" so it only has x's data in it
        #
        tbl[] <- 0
        t2 <- tbl
        # save this to use in a minute
        x.tbl <- table(x)
        tbl[names(x.tbl)] <- x.tbl
        # Now fill "t2" with y's entries...
        y.tbl <- table(y)
        t2[names(y.tbl)] <- y.tbl
        #...and subtract
        t3 <- t2 - tbl
        #
        # Where t3 is positive, y had excess over x; where it's negative,
        # x had excess and we don't care.
        #
        t3[t3 < 0] <- 0
        return(as.numeric(rep(names(t3), t3)))
}

Bill shortened this quite a bit and reduced it to a total of four lines. I've packaged it as a function here:

x.outof.y.fun <- function(x, y)
{
   lvls <- sort(unique(y))
   tx <- table(factor(x, levels = lvls))
   ty <- table(factor(y, levels = lvls))
   return(rep(lvls, ty-tx))
}

My thanks to all of you for your kind and timely assistance!

Kim Elmore
                          Kim Elmore, Ph.D.
                       University of Oklahoma
        Cooperative Institute for Mesoscale Meteorological Studies
"All of weather is divided into three parts: Yes, No, and Maybe. The
greatest of these is Maybe" The original Latin appears to be garbled.


<Prev in Thread] Current Thread [Next in Thread>