Hi,
I have a data frame of countries that has the following three variables:
the country's isocode (a 3-letter code for the country), year and
population size in that year. Let's say there are 3 countries and
anywhere from 1 - 6 years of population data for each country.
Basically, it's a panel dataset. An example would be:
isocode year pop
usa 1990 10
usa 1991 12
usa 1992 15
usa 1993 13
usa 1994 16
usa 1995 17
can 1992 5
can 1993 6
gbr 1997 15
I want to create another data frame that has just 3 rows, one for each
country, the most recent year for that country (i.e., the max year), and
the population size for that year for that country. How do I do this in
the easiest way possible? In SQL, I would use
select isocode, max(year), pop
from isocode
group by isocode
having year = max(year)
order by isocode;
which gives the answer
isocode year pop
usa 1995 17
can 1993 6
gbr 1997 15
How can I do something as simple in S+?
Thanks,
Walt
--
________________________
Walter R. Paczkowski, Ph.D.
Data Analytics Corp.
44 Hamilton Lane
Plainsboro, NJ 08536
________________________
(V) 609-936-8999
(F) 609-936-3733
dataanalytics@earthlink.net
www.dataanalyticscorp.com
|