s-news
[Top] [All Lists]

Re: This must be a simple task...

To: <s-news@lists.biostat.wustl.edu>
Subject: Re: This must be a simple task...
From: "Thomas Jagger" <tjagger@blarg.net>
Date: Sat, 23 Jul 2005 10:19:36 -0600
In-reply-to: <6.2.3.4.2.20050722162032.0385be60@129.15.69.4>
Thread-index: AcWPC80yURx3h+JYQ+6VnyH5BpyMfQAAeArw
Kim
Hi, maybe I am missing something here. Columns in a data frame all have the
same length, so what are you replacing the say three values of 1.5 with?
If you mean that you have a list and order is not an issue then use rle:

I provide error messages if your assumption that each column contains the
previous columns data, either because you have dropped a unique value or
reduced the number of duplicates

a<-c(1.0,1.5,1.5,2,2)
b<-c(a,c(1.5,1.5,2))

rem.ainb<-function(a,b)
{
        brl<-rle(sort(b)) #"contains all of a"
        arl<-rle(sort(a))
        ainb<-match(arl$values,brl$values,nomatch=NA) 
      if(any(is.na(ainb))) stop("a has at least one value that is not in b")
      brl$lengths[ainb]<-brl$lengths[ainb]-arl$lengths 
      if(any(brl$lengths<0)) 
stop("There are more duplicates for some value in a than this value in b")
      return(rep(brl$values,brl$lengths))
}
###start from the column with the most duplicates:

e.g.

tmp<-list(a=a,b=b,c=c(b,c(3.0,2.0,4.0)))

tmpfun<-function(x)
{       
 for(i in length(x):2)
   {
    x[[i]]<-rem.ainb(x[[i-1]],x[[i]])
   }
 x
}

tmpfun(tmp)

> tmpfun(tmp)
$a:
[1] 1.0 1.5 1.5 2.0 2.0

$b:
[1] 1.5 1.5 2.0

$c:
[1] 2 3 4


If this will work let me know,


Tom Jagger

-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Kim Elmore
Sent: Friday, July 22, 2005 4:22 PM
To: S-News
Subject: [S] This must be a simple task...

I have a data frame (several, really) that have a structure I'm 
trying to unravel. After column 1, which contains duplicates, each 
succeeding column contains new data *plus* all the data form column 
1. Thus column 2 contains the entire contents of column 1 plus new 
stuff, which itself may contain duplicates. Column 3 contains the 
entire contents of column 2 plus new stuff, etc. I want to 
restructure the data so that each column contains only the data 
unique to it, but retaining any of the original duplicates.

Things would be much easier if the new data had merely been pasted 
onto the previous column's data, but such is not the case: the 
original order is scrambled.

For example, let's say there are two values of 1.5 in column 1, and 
column 2 contains five values of 1.5 (the two values from column 1 
plus three others that should remain in column 2). When I'm done, I 
want there to be three 1.5 values in column 2.

Every cute trick I've tried removes *all* duplicates such that, in my 
above example, I am left with only one value of 1.5. There should be 
an easy, elegant  (no for loops) way to do this, but I'm missing it
altogether.

Kim Elmore

How do I remove exactly what's in column 1 from column 2 but leave whatever

I've tried is.element(), a.k.a, %in%, match(), intersect(), 
duplicated() but I also lose all of the proper duplicates.
                           Kim Elmore, Ph.D.
                        University of Oklahoma
         Cooperative Institute for Mesoscale Meteorological Studies
"All of weather is divided into three parts: Yes, No, and Maybe. The
greatest of these is Maybe" The original Latin appears to be garbled.

--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news


<Prev in Thread] Current Thread [Next in Thread>