s-news
[Top] [All Lists]

RE Selecting data from a data frame

To: s-news@lists.biostat.wustl.edu
Subject: RE Selecting data from a data frame
From: gerald.jean@dgag.ca
Date: Tue, 11 Nov 2008 10:31:03 -0500
Cc: s-news@lists.biostat.wustl.edu,s-news-owner@lists.biostat.wustl.edu
In-reply-to: <491981D3.9010601@earthlink.net>

s-news-owner@lists.biostat.wustl.edu a écrit sur 2008/11/11 08:00:03 :

> Hi,
>
> I have a data frame of countries that has the following three variables:
> the country's isocode (a 3-letter code for the country), year and
> population size in that year.  Let's say there are 3 countries and
> anywhere from 1 - 6 years of population data for each country.
> Basically, it's a panel dataset.  An example would be:
>
>     isocode year pop
>     usa 1990 10
>     usa 1991 12
>     usa 1992 15
>     usa 1993 13
>     usa 1994 16
>     usa 1995 17
>     can 1992 5
>     can 1993 6
>     gbr 1997 15
>
> I want to create another data frame that has just 3 rows, one for each
> country, the most recent year for that country (i.e., the max year), and
> the population size for that year for that country.  How do I do this in
> the easiest way possible?  In SQL, I would use
>
> select isocode, max(year), pop
>     from isocode
>     group by isocode
>     having year = max(year)
>     order by isocode;
>
> which gives the answer
>
>       isocode  year  pop
>       usa       1995   17
>       can       1993     6
>       gbr       1997   15
>
> How can I do something as simple in S+?
>
> Thanks,
>
> Walt

Here is one way of doing it:

> theData <-data.frame( isocode=c("usa", "usa", "usa", "usa", "usa",
+       "usa", "can","can","gbr"), year=c(1990,1991,1992,1993,
+       1994,1995,1992,1993,1997),
+       pop=c(10,12,15,13,16,17,5,6,15))
> theData$isocode <- factor(theData$isocode, levels = c("usa", "can",
"gbr"))
>
> by(theData, INDICES = theData$isocode,
+    FUN = function(x) {MaxInd = which.max(x$year)
+                       x[MaxInd, ]})
theData$isocode:usa
  isocode year pop
6     usa 1995  17
--------------------------------------------------------------------
theData$isocode:can
  isocode year pop
8     can 1993   6
--------------------------------------------------------------------
theData$isocode:gbr
  isocode year pop
9     gbr 1997  15

You will need to manipulate the resulting object a little bit to have it
back in a data.frame.

Have fun,

>
>
>
>
> --
> ________________________
>
> Walter R. Paczkowski, Ph.D.
> Data Analytics Corp.
> 44 Hamilton Lane
> Plainsboro, NJ 08536
> ________________________
> (V) 609-936-8999
> (F) 609-936-3733
> dataanalytics@earthlink.net
> www.dataanalyticscorp.com

Gérald Jean
Conseiller senior en statistiques,
VP Planification et Développement des Marchés,
Desjardins Groupe d'Assurances Générales
télephone            : (418) 835-4900 poste (7639)
télecopieur          : (418) 835-6657
courrier électronique: gerald.jean@dgag.ca

"In God we trust, all others must bring data"  W. Edwards Deming
>
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news



Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés 
uniquement aux personnes identifiées et peuvent contenir des informations
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu 
ce message par erreur, veuillez le détruire.

This communication ( and/or the attachments ) is intended for named recipients 
only and may contain privileged or confidential information which is
not to be disclosed. If you received this communication by mistake please 
destroy all copies.




Faites bonne impression et imprimez seulement au besoin !
Think green before you print !

Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés 
uniquement aux personnes identifiées et peuvent contenir des informations 
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu 
ce message par erreur, veuillez le détruire.

This communication (and/or the attachments) is intended for named recipients 
only and may contain privileged or confidential information which is not to be 
disclosed. If you received this communication by mistake please destroy all 
copies.

<Prev in Thread] Current Thread [Next in Thread>