| To: | "S-PLUS Newsgroup (E-mail)" <s-news@lists.biostat.wustl.edu> |
|---|---|
| Subject: | tapply() |
| From: | Winnie Lambert <lambert.winifred@ensco.com> |
| Date: | Wed, 12 Dec 2001 17:37:13 -0500 |
|
All,
I am using S-PLUS 6
for Linux.
My problem is not
that S-PLUS is misbehaving, it is doing exactly as it should. The problem
is that I need to do something else to get what I want and I don't know what it
is. Here is what I am doing:
I first issue the
command
> good.s <-
get(infile)$Dir <= 240 & get(infile)$Dir > 180 &
!is.na(get(infile)$Dir)
to get a logical
vector indicating which records have Dir between 180 and 240 and no
NAs.
I use good.s in the
command
>
tapply(get(infile)$Spd[good.s], get(infile)$Hour[good.s],
mean)
to get the mean
values of Spd for each Hour and Spd that has Dir between 180 and
240.
The output to the
above command for a particular data frame is
0
1
2
3
4
5
6
7
8 9
4 5.444444 6.225806 7.240506 7.333333 8.064815 8.75 8.190083 7.42623 9.353535
10
11
12
13
14
15
16
17 18
8.783019 8.76 8.817308 9.108696 9.944444 9.385714 9.180723 8.868421 9.551724
19
20
21 23
7.388889 6.777778 3.333333 5.875 To get the number of
observations that went into the calculation of the means,
>
tapply(get(infile)$Spd[good.s], get(infile)$Hour[good.s],
length)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 23 4 36 62 79 87 108 120 121 122 99 106 100 104 92 72 70 83 76 58 18 9 3 8 Note in both outputs
the absence of a value for Hour = 22, all other hours of the day are accounted
for. Since there are no observations available at any of the records
where Hour = 22 given the restrictions in good.s, S-PLUS is giving the correct
answer. However, I need to know and see that length(Hour=22) = 0 and
mean(Hour=22) = NA. In other words, I would like to
see
19
20
21 22 23
7.388889 6.777778 3.333333 NA 5.875 and
0
1 2 3 4
5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23
4 36 62 79 87 108 120 121 122 99 106 100 104 92 72 70 83 76 58 18 9 3 0 8 Is it possible to
slip in an element for Hour = 22 and put these values in their respective
tables, or should I use some other function than tapply()?
Incidently, these
commands are part of a script that processes many files. The values
given for Dir in this example are actually variables in the script and change
from 0 to 360 in variable increments. Not all files have data missing for
Hour=22, some have all hours available and others have different hours
missing. Any suggestions would have to be flexible in testing for whether
records are missing for a particular hour or not. Thanks for any
advice.
Win
Lambert
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | lme() question, Gunter, Bert |
|---|---|
| Next by Date: | Mixture of 3 normals, Sumithra Mandrekar |
| Previous by Thread: | lme() question, Gunter, Bert |
| Next by Thread: | Re: tapply(), Terry Therneau |
| Indexes: | [Date] [Thread] [Top] [All Lists] |