Kim
Hi, maybe I am missing something here. Columns in a data frame all have the
same length, so what are you replacing the say three values of 1.5 with?
If you mean that you have a list and order is not an issue then use rle:
I provide error messages if your assumption that each column contains the
previous columns data, either because you have dropped a unique value or
reduced the number of duplicates
a<-c(1.0,1.5,1.5,2,2)
b<-c(a,c(1.5,1.5,2))
rem.ainb<-function(a,b)
{
brl<-rle(sort(b)) #"contains all of a"
arl<-rle(sort(a))
ainb<-match(arl$values,brl$values,nomatch=NA)
if(any(is.na(ainb))) stop("a has at least one value that is not in b")
brl$lengths[ainb]<-brl$lengths[ainb]-arl$lengths
if(any(brl$lengths<0))
stop("There are more duplicates for some value in a than this value in b")
return(rep(brl$values,brl$lengths))
}
###start from the column with the most duplicates:
e.g.
tmp<-list(a=a,b=b,c=c(b,c(3.0,2.0,4.0)))
tmpfun<-function(x)
{
for(i in length(x):2)
{
x[[i]]<-rem.ainb(x[[i-1]],x[[i]])
}
x
}
tmpfun(tmp)
> tmpfun(tmp)
$a:
[1] 1.0 1.5 1.5 2.0 2.0
$b:
[1] 1.5 1.5 2.0
$c:
[1] 2 3 4
If this will work let me know,
Tom Jagger
-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Kim Elmore
Sent: Friday, July 22, 2005 4:22 PM
To: S-News
Subject: [S] This must be a simple task...
I have a data frame (several, really) that have a structure I'm
trying to unravel. After column 1, which contains duplicates, each
succeeding column contains new data *plus* all the data form column
1. Thus column 2 contains the entire contents of column 1 plus new
stuff, which itself may contain duplicates. Column 3 contains the
entire contents of column 2 plus new stuff, etc. I want to
restructure the data so that each column contains only the data
unique to it, but retaining any of the original duplicates.
Things would be much easier if the new data had merely been pasted
onto the previous column's data, but such is not the case: the
original order is scrambled.
For example, let's say there are two values of 1.5 in column 1, and
column 2 contains five values of 1.5 (the two values from column 1
plus three others that should remain in column 2). When I'm done, I
want there to be three 1.5 values in column 2.
Every cute trick I've tried removes *all* duplicates such that, in my
above example, I am left with only one value of 1.5. There should be
an easy, elegant (no for loops) way to do this, but I'm missing it
altogether.
Kim Elmore
How do I remove exactly what's in column 1 from column 2 but leave whatever
I've tried is.element(), a.k.a, %in%, match(), intersect(),
duplicated() but I also lose all of the proper duplicates.
Kim Elmore, Ph.D.
University of Oklahoma
Cooperative Institute for Mesoscale Meteorological Studies
"All of weather is divided into three parts: Yes, No, and Maybe. The
greatest of these is Maybe" The original Latin appears to be garbled.
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
|