One way to do what I think you want is to use by(), and then massage the
results into the form you want, e.g.:
> a <- data.frame(ID = c(1 ,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3),
+ DOSE= c(.25,.25,.5,0.25,0.25,2.5,2.5,0.25,
2.5,2.5,2.5,3.5,2.5,2.5,2,2,4,2,2.5,2.5,2,2),
+ DATE=
timeDate(as.character(seq.dates("2006/01/01", by="days", length=22))))
# Construct a 2-d list array, with each element being a vector of
# start/end dates.
# Use character as the type for the start/end dates, because
# timeDate vectors don't work with sapply/unlist/c functions.
> y <- by(a, a[,c("ID","DOSE")], function(x) as.character(range(x$DATE)))
> length(y)
[1] 18
> dimnames(y)
$ID:
[1] "1" "2" "3"
$DOSE:
[1] "0.25" "0.5" "2" "2.5" "3.5" "4"
> y[[1]]
[1] "2006/01/01" "2006/01/08"
> y[1,1]
[[1]]:
[1] "2006/01/01" "2006/01/08"
>
> x <- cbind(expand.grid(dimnames(y)), STARTDATE=sapply(y, function(s)
if (is.null(s)) "NA" else s[[1]]), ENDDATE=sapply(y, function(s) if
(is.null(s)) "NA" else s[[2]]))
> x
ID DOSE STARTDATE ENDDATE
1 1 0.25 2006/01/01 2006/01/08
2 2 0.25 NA NA
3 3 0.25 NA NA
4 1 0.5 2006/01/03 2006/01/03
5 2 0.5 NA NA
6 3 0.5 NA NA
7 1 2 NA NA
8 2 2 2006/01/15 2006/01/16
9 3 2 2006/01/18 2006/01/22
10 1 2.5 2006/01/06 2006/01/07
11 2 2.5 2006/01/09 2006/01/14
12 3 2.5 2006/01/19 2006/01/20
13 1 3.5 NA NA
14 2 3.5 2006/01/12 2006/01/12
15 3 3.5 NA NA
16 1 4 NA NA
17 2 4 2006/01/17 2006/01/17
18 3 4 NA NA
> # ... convert STARTDATE back to timeDate if wanted,
> # and remove rows with NA STARTDATE/ENDDATE by subsetting
Stuyckens, Kim [PRDBE] wrote:
Dear S-News users,
I have a dataset which looks as follows containing thousands of rows:
ID DOSE DATE
_1 0.25 01/01/2006_
_1 0.25 01/02/2006_
1 0.50 01/03/2006
_1 0.25 01/04/2006_
_1 0.25 01/05/2006_
1 2.50 01/06/2006
1 2.50 01/07/2006
1 0.25 01/08/2006
2 2.50 01/09/2006
2 2.50 01/10/2006
2 2.50 01/11/2006
2 3.50 01/12/2006
2 2.50 01/13/2006
2 2.50 01/14/2006
2 2.00 01/15/2006
2 2.00 01/16/2006
2 4.00 01/17/2006
3 2.00 01/18/2006
3 2.50 01/19/2006
3 2.50 01/20/2006
3 2.00 01/21/2006
3 2.00 01/22/2006
a _data.frame(ID = c(1 ,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3),
DOSE= c(.25,.25,.5,0.25,0.25,2.5,2.5,0.25,
2.5,2.5,2.5,3.5,2.5,2.5,2,2,4,2,2.5,2.5,2,2),
DATE=
timeDate(as.character(seq.dates("01/01/2006", by="days", length=22))) )
My problem is that I would like to find the dosing time intervals that a
certain ID(subject in this case) takes the same dose._ Mind that the
same dose can return within the same ID at a later point in time._
The desired result looks as follows
ID DOSE STARTDATE ENDDATE
_1 0.25 01/01/2006 01/02/2006_
1 0.50 01/03/2006 01/03/2006
_1 0.25 01/04/2006 01/05/2006_
1 2.50 01/06/2006 01/07/2006
1 0.25 01/08/2006 01/08/2006
2 2.50 01/09/2006 01/11/2006
2 3.50 01/12/2006 01/12/2006
2 2.50 01/13/2006 01/14/2006
2 2.00 01/15/2006 01/16/2006
2 4.00 01/17/2006 01/17/2006
3 2.00 01/18/2006 01/18/2006
3 2.50 01/19/2006 01/20/2006
3 2.00 01/21/2006 01/22/2006
I solved this problem until now with a loop to look for the moment when
dose is changing, but I hope there is a more efficient way because the
loop is taking to much time and gives memory problems with the amount
of data I have.
Any input or suggestions are very much appreciated
BTW I am using SPLUS 6.2 in a Windows environment
Thanks in advance,
Kind regards,
Kim
|