s-news
[Top] [All Lists]

Re: Beginner's question: Merging dataset

To: "Bos, Roger" <BosR@ny.rothinc.com>
Subject: Re: Beginner's question: Merging dataset
From: Jindan Zhou <jindan@jindan.homedns.org>
Date: Mon, 13 Dec 2004 13:26:40 -0600
Cc: 'Jindan Zhou' <jindan@jindan.homedns.org>, SundarDorai-Raj <sundar.dorai-raj@pdf.com>, s-news@lists.biostat.wustl.edu
In-reply-to: <E07964B84690CC47B01421B2A71D4F0F095F4AC8@rinnycs0000>
References: <E07964B84690CC47B01421B2A71D4F0F095F4AC8@rinnycs0000>
Yes, yes! that is it! I wasn't aware of the existence of the function merge;-)

Thanks,

On Mon, 13 Dec 2004, Bos, Roger wrote:

> Jindan,
>
> I have used the same merge function in both R and S+ with no problems. Look
> at ?merge.  For example,
>
> new <- merge(X, Y, by.x="SiteID", by.y="Site.ID", all.x=TRUE)
>
> the all.x=TRUE will make sure you don't loose any of the repeated
> observations in SiteID.  I know this because my biggest problem was to get
> rid of the repeated observations, and I was told to use duplicated().
>
> HTH,
>
> Roger
>
> -----Original Message-----
> From: Jindan Zhou [mailto:jindan@jindan.homedns.org]
> Sent: Monday, December 13, 2004 2:07 PM
> To: s-news@lists.biostat.wustl.edu
> Subject: [S] Beginner's question: Merging dataset
>
>
> Hello members!
>
> I need to merge two datasets in the following manner:
>
> The first dataset, mydata, has 6 columns
>
>   SiteID Summ.Per. Year  NO3      Dates          X
> 1   AL02    Summer 2001 0.86   6/5/2001  8/28/2001
> 2   AL02      Fall 2001 0.47  8/28/2001 11/27/2001
> 3   AL02    Winter 2002 0.62 11/27/2001  2/26/2002
> ....
> 21   AL10    Winter 1986 0.62 12/3/1985 3/4/1986
> 22   AL10    Spring 1986 0.73  3/4/1986 6/3/1986
> 23   AL10    Summer 1986 1.36  6/3/1986 9/2/1986
>
> Notice each SiteID may appear multiple times.
>
> The second dataset, latlong, has 3 columns
>
>   Site.ID Latitude Longitude
> 1    AL02  30.7905  -87.8497
> 2    AL10  32.4583  -87.2422
> 3    AL24  30.4741  -88.1411
> ...
>
> Each Site.ID will appear exactly one time.
>
> What I need to do is to generate two additional columns in mydata, using
> information from latlong, for *each* occurrence of SiteID in mydata.
> What is the syntax in S for this job? Will the syntax be different from
> R? I have been struggling for the solution since last night without a
> clue, so sad!
>
> Thanks for any hint!
>
> Jindan
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
>
> ********************************************************************** * This 
> message is for the named person's use only. It may
> contain confidential, proprietary or legally privileged
> information. No right to confidential or privileged treatment
> of this message is waived or lost by any error in
> transmission. If you have received this message in error,
> please immediately notify the sender by e-mail,
> delete the message and all copies from your system and destroy
> any hard copies. You must not, directly or indirectly, use,
> disclose, distribute, print or copy any part of this message
> if you are not the intended recipient.
> **********************************************************************
>

<Prev in Thread] Current Thread [Next in Thread>