s-news
[Top] [All Lists]

Re: This must be a simple task...

To: S-News Mail List <s-news@lists.biostat.wustl.edu>
Subject: Re: This must be a simple task...
From: "Kim Elmore" <Kim.Elmore@noaa.gov>
Date: Fri, 22 Jul 2005 21:49:41 -0500
In-reply-to: <Pine.SOL.4.32.0507221521250.18308-100000@jaques>
References: <6.2.3.4.2.20050722162032.0385be60@129.15.69.4> <Pine.SOL.4.32.0507221521250.18308-100000@jaques>
I apologize for not being more clear; I admit being in a hurry to get the kids from day care. Regardless, I don't mean for the group to divine my problem based on my poor description: yes, these data are of different length. I get them in a data frame by padding the columns with an appropriate number of NA values. I operate on the data columns by using na.omit() whenever necessary. This probably isn't the best way to go, but it makes sense to me for the data I have.

I'm at home and don't have the data here, but Sam Buttrey got close, but left out the 8. I'll modify his example:

x <- c(1, 2, 3, 1.5, 1.5, 8)
y <- c(2, 1.5, 1.5, 1.5, 1.5, 1.5, 2, 1, 3, 9, 11, 8)

Note that the vector y contains the vector x, but permuted. The vector y which contains its own duplicate values. I want to retain those.

The result I'm after is:

c(1.5, 1.5, 1.5, 2, 9, 11). This is the vector y with every element of x removed from it.

The data frames I have contain 6 columns. Column 1 is independent. Column 2 contains new data plus all of column 1; column 3 contains new data plus all of columns 1 and 2, and so on out to column 6.

Sam's example may do exactly what I want; I'll have to try it. I also see Tom Jagger's example; I'll try that, too.

Kim Elmore



At 05:22 PM 7/22/2005, you wrote:
On Fri, 22 Jul 2005, Kim Elmore wrote:

> I have a data frame (several, really) that have a structure I'm
> trying to unravel. After column 1, which contains duplicates, each
> succeeding column contains new data *plus* all the data form column
> 1. Thus column 2 contains the entire contents of column 1 plus new
> stuff, which itself may contain duplicates. Column 3 contains the
> entire contents of column 2 plus new stuff, etc. I want to
> restructure the data so that each column contains only the data
> unique to it, but retaining any of the original duplicates.
>
> Things would be much easier if the new data had merely been pasted
> onto the previous column's data, but such is not the case: the
> original order is scrambled.
>
> For example, let's say there are two values of 1.5 in column 1, and
> column 2 contains five values of 1.5 (the two values from column 1
> plus three others that should remain in column 2). When I'm done, I
> want there to be three 1.5 values in column 2.
>
> Every cute trick I've tried removes *all* duplicates such that, in my
> above example, I am left with only one value of 1.5. There should be
> an easy, elegant (no for loops) way to do this, but I'm missing it altogether.

I'm having trouble understanding the problem.  Do you have
a fairly short input dataset along with the desired result?

----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
Bill at Insightful dot com
360-428-8146

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."


<Prev in Thread] Current Thread [Next in Thread>