Hi, I have a problem with sapply, by and NA's. Here is an example...
> x_data.frame(IdS=c(1,1,1,2,2,2),
+ Age=c(1,3,4,2,4,6),
+ CFT=c(.2,.5,.7,.3,.5,.6))
> tau_.6
> x
IdS Age CFT
1 1 1 0.2
2 1 3 0.5
3 1 4 0.7
4 2 2 0.3
5 2 4 0.5
6 2 6 0.6
IdS is a coding variable for subject (here only 2 subjects, but in the final
file there is about 5,000 subjects), tau is a threshold (varying from
.1 to .9 by .1).
I would like a file with the IdS variable and, for each value of IdS, two ages
(Age1
and Age2) defined as following :
1.1 Age1= age at which max(CFT | CFT<= to tau) is reached;
2.1 Age2 = age at which min(CFT | CFT >= tau) is reached;
3.1 if undefined, NA.
For tau=.6, I would like to have:
IdS Age1 Age2
1 1 3 4
2 2 6 6
For tau=.1, I would like to have:
IdS Age1 Age2
1 1 NA 1
2 2 NA 2
For tau=.7, I would like to have:
IdS Age1 Age2
1 1 4 4
2 2 6 NA
If I write:
>apply(by(x,x$IdS, function(x)
>x[x$CFT==max(x[x$CFT<=tau,"CFT"]),"Age"]),function(x) x,simplify=T)
>sapply(by(x,x$IdS, function(x)
>x[x$CFT==min(x[x$CFT>=tau,"CFT"]),"Age"]),function(x) x,simplify=T)
it works well, except when NA's coming:
for tau=.6 (OK) :
> sapply(by(x, x$IdS, function(x)
x[x$CFT == max(x[x$CFT <= tau, "CFT"]), "Age"]), function(x)
x, simplify = T)
[1] 3 6
> sapply(by(x, x$IdS, function(x)
x[x$CFT == min(x[x$CFT >= tau, "CFT"]), c("Age")]), function(x)
x, simplify = T)
[1] 4 6
for tau=.7:
> sapply(by(x, x$IdS, function(x)
x[x$CFT == max(x[x$CFT <= tau, "CFT"]), c("Age")]), function(x)
x, simplify = T)
[1] 4 6
> sapply(by(x, x$IdS, function(x)
x[x$CFT == min(x[x$CFT >= tau, "CFT"]), c("Age")]), function(x)
x, simplify = T)
[[1]]:
[1] 4
[[2]]:
[1] NA NA NA
I need to avoid loops, because of the 5,000 subjects (and sometimes 10
observations per subject) and the 9 thresholds...
Thank you,
Tristan
--
Laboratoire Central des Ponts et Chaussées
[Division ESAR ? Section AGR]
Route de Bouaye BP 4129
44341 Bouguenais Cedex
France
Tél 33 (0)2 40 84 56 18
Fax 33 (0)2 40 84 59 92
|