s-news
[Top] [All Lists]

Selecting data from a data frame

To: s-news@lists.biostat.wustl.edu
Subject: Selecting data from a data frame
From: "Data Analytics Corp." <dataanalytics@earthlink.net>
Date: Tue, 11 Nov 2008 08:00:03 -0500
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=earthlink.net; b=PHka7W1F3/9hFr/wNgNM2VemsOfjeogo+lsVgcygvjMtVhMsRO0F+jGeclVt7eVE; h=Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:Content-Type:Content-Transfer-Encoding:X-ELNK-Trace:X-Originating-IP;
User-agent: Thunderbird 2.0.0.17 (Windows/20080914)
Hi,

I have a data frame of countries that has the following three variables: the country's isocode (a 3-letter code for the country), year and population size in that year. Let's say there are 3 countries and anywhere from 1 - 6 years of population data for each country. Basically, it's a panel dataset. An example would be:

   isocode year pop
   usa 1990 10
   usa 1991 12
   usa 1992 15
   usa 1993 13
   usa 1994 16
   usa 1995 17
   can 1992 5
   can 1993 6
   gbr 1997 15

I want to create another data frame that has just 3 rows, one for each country, the most recent year for that country (i.e., the max year), and the population size for that year for that country. How do I do this in the easiest way possible? In SQL, I would use

select isocode, max(year), pop
   from isocode
   group by isocode
   having year = max(year)
   order by isocode;

which gives the answer

     isocode  year  pop
     usa       1995   17
     can       1993     6
     gbr       1997   15

How can I do something as simple in S+?

Thanks,

Walt



--
________________________

Walter R. Paczkowski, Ph.D.
Data Analytics Corp.
44 Hamilton Lane
Plainsboro, NJ 08536
________________________
(V) 609-936-8999
(F) 609-936-3733
dataanalytics@earthlink.net
www.dataanalyticscorp.com


<Prev in Thread] Current Thread [Next in Thread>