s-news
[Top] [All Lists]

tapply()

To: "S-PLUS Newsgroup (E-mail)" <s-news@lists.biostat.wustl.edu>
Subject: tapply()
From: Winnie Lambert <lambert.winifred@ensco.com>
Date: Wed, 12 Dec 2001 17:37:13 -0500
All,
 
I am using S-PLUS 6 for Linux.
 
My problem is not that S-PLUS is misbehaving, it is doing exactly as it should.  The problem is that I need to do something else to get what I want and I don't know what it is.  Here is what I am doing:
 
I first issue the command
 
> good.s <- get(infile)$Dir <= 240 & get(infile)$Dir > 180 & !is.na(get(infile)$Dir)
 
to get a logical vector indicating which records have Dir between 180 and 240 and no NAs.
I use good.s in the command
 
> tapply(get(infile)$Spd[good.s], get(infile)$Hour[good.s], mean)
 
to get the mean values of Spd for each Hour and Spd that has Dir between 180 and 240.
The output to the above command for a particular data frame is
 
 0            1           2            3            4            5      6            7          8            9
 4 5.444444 6.225806 7.240506 7.333333 8.064815 8.75 8.190083 7.42623 9.353535
 
       10      11         12          13          14            15           16          17           18
 8.783019 8.76 8.817308 9.108696 9.944444 9.385714 9.180723 8.868421 9.551724
 
         19           20          21      23
 7.388889 6.777778 3.333333 5.875
 
To get the number of observations that went into the calculation of the means,
 
> tapply(get(infile)$Spd[good.s], get(infile)$Hour[good.s], length)
 0  1    2  3  4    5    6    7    8   9   10  11   12 13 14 15 16 17 18 19 20 21 23
 4 36 62 79 87 108 120 121 122 99 106 100 104 92 72 70 83 76 58 18  9  3  8
 
Note in both outputs the absence of a value for Hour = 22, all other hours of the day are accounted for.  Since there are no observations available at any of the records where Hour = 22 given the restrictions in good.s, S-PLUS is giving the correct answer.  However, I need to know and see that length(Hour=22) = 0 and mean(Hour=22) = NA.  In other words, I would like to see
 
         19           20          21      22       23
 7.388889 6.777778 3.333333   NA  5.875
 
and
 
 0  1   2   3  4    5    6      7    8   9   10  11   12  13  14  15 16  17 18  19  20  21  22  23
 4 36 62 79 87 108 120 121 122 99 106 100 104  92  72 70  83 76  58 18   9    3    0    8
 
Is it possible to slip in an element for Hour = 22 and put these values in their respective tables, or should I use some other function than tapply()?
 
Incidently, these commands are part of a script that processes many files.   The values given for Dir in this example are actually variables in the script and change from 0 to 360 in variable increments.  Not all files have data missing for Hour=22, some have all hours available and others have different hours missing.  Any suggestions would have to be flexible in testing for whether records are missing for a particular hour or not.  Thanks for any advice.
 
Win Lambert
<Prev in Thread] Current Thread [Next in Thread>