s-news-owner@lists.biostat.wustl.edu a écrit sur 2008/11/11 08:00:03 :
> Hi,
>
> I have a data frame of countries that has the following three variables:
> the country's isocode (a 3-letter code for the country), year and
> population size in that year. Let's say there are 3 countries and
> anywhere from 1 - 6 years of population data for each country.
> Basically, it's a panel dataset. An example would be:
>
> isocode year pop
> usa 1990 10
> usa 1991 12
> usa 1992 15
> usa 1993 13
> usa 1994 16
> usa 1995 17
> can 1992 5
> can 1993 6
> gbr 1997 15
>
> I want to create another data frame that has just 3 rows, one for each
> country, the most recent year for that country (i.e., the max year), and
> the population size for that year for that country. How do I do this in
> the easiest way possible? In SQL, I would use
>
> select isocode, max(year), pop
> from isocode
> group by isocode
> having year = max(year)
> order by isocode;
>
> which gives the answer
>
> isocode year pop
> usa 1995 17
> can 1993 6
> gbr 1997 15
>
> How can I do something as simple in S+?
>
> Thanks,
>
> Walt
Here is one way of doing it:
> theData <-data.frame( isocode=c("usa", "usa", "usa", "usa", "usa",
+ "usa", "can","can","gbr"), year=c(1990,1991,1992,1993,
+ 1994,1995,1992,1993,1997),
+ pop=c(10,12,15,13,16,17,5,6,15))
> theData$isocode <- factor(theData$isocode, levels = c("usa", "can",
"gbr"))
>
> by(theData, INDICES = theData$isocode,
+ FUN = function(x) {MaxInd = which.max(x$year)
+ x[MaxInd, ]})
theData$isocode:usa
isocode year pop
6 usa 1995 17
--------------------------------------------------------------------
theData$isocode:can
isocode year pop
8 can 1993 6
--------------------------------------------------------------------
theData$isocode:gbr
isocode year pop
9 gbr 1997 15
You will need to manipulate the resulting object a little bit to have it
back in a data.frame.
Have fun,
>
>
>
>
> --
> ________________________
>
> Walter R. Paczkowski, Ph.D.
> Data Analytics Corp.
> 44 Hamilton Lane
> Plainsboro, NJ 08536
> ________________________
> (V) 609-936-8999
> (F) 609-936-3733
> dataanalytics@earthlink.net
> www.dataanalyticscorp.com
Gérald Jean
Conseiller senior en statistiques,
VP Planification et Développement des Marchés,
Desjardins Groupe d'Assurances Générales
télephone : (418) 835-4900 poste (7639)
télecopieur : (418) 835-6657
courrier électronique: gerald.jean@dgag.ca
"In God we trust, all others must bring data" W. Edwards Deming
>
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu. To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message: unsubscribe s-news
Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés
uniquement aux personnes identifiées et peuvent contenir des informations
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu
ce message par erreur, veuillez le détruire.
This communication ( and/or the attachments ) is intended for named recipients
only and may contain privileged or confidential information which is
not to be disclosed. If you received this communication by mistake please
destroy all copies.
Faites bonne impression et imprimez seulement au besoin !
Think green before you print !
Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés
uniquement aux personnes identifiées et peuvent contenir des informations
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu
ce message par erreur, veuillez le détruire.
This communication (and/or the attachments) is intended for named recipients
only and may contain privileged or confidential information which is not to be
disclosed. If you received this communication by mistake please destroy all
copies.
|