s-news
[Top] [All Lists]

Re: derive intervals a subject is taken the same dose

To: "Stuyckens, Kim [PRDBE]" <KSTUYCK1@PRDBE.JNJ.COM>, S-news <s-news@lists.biostat.wustl.edu>
Subject: Re: derive intervals a subject is taken the same dose
From: Tony Plate <tplate@blackmesacapital.com>
Date: Wed, 21 Jun 2006 10:30:57 -0600
In-reply-to: <D72CDC56EAC21D4A9698B069A30AC2290C3F8ACE@janbebeexs3.eu.jnj.com>
References: <D72CDC56EAC21D4A9698B069A30AC2290C3F8ACE@janbebeexs3.eu.jnj.com>
User-agent: Mozilla Thunderbird 1.0.5 (Windows/20050711)
One way to do what I think you want is to use by(), and then massage the results into the form you want, e.g.:

> a <- data.frame(ID = c(1 ,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3),
+ DOSE= c(.25,.25,.5,0.25,0.25,2.5,2.5,0.25, 2.5,2.5,2.5,3.5,2.5,2.5,2,2,4,2,2.5,2.5,2,2), + DATE= timeDate(as.character(seq.dates("2006/01/01", by="days", length=22))))
# Construct a 2-d list array, with each element being a vector of
# start/end dates.
# Use character as the type for the start/end dates, because
# timeDate vectors don't work with sapply/unlist/c functions.
> y <- by(a, a[,c("ID","DOSE")], function(x) as.character(range(x$DATE)))
> length(y)
[1] 18
> dimnames(y)
$ID:
[1] "1" "2" "3"

$DOSE:
[1] "0.25" "0.5"  "2"    "2.5"  "3.5"  "4"

> y[[1]]
[1] "2006/01/01" "2006/01/08"
> y[1,1]
[[1]]:
[1] "2006/01/01" "2006/01/08"

>
> x <- cbind(expand.grid(dimnames(y)), STARTDATE=sapply(y, function(s) if (is.null(s)) "NA" else s[[1]]), ENDDATE=sapply(y, function(s) if (is.null(s)) "NA" else s[[2]]))
> x
   ID DOSE  STARTDATE    ENDDATE
 1  1 0.25 2006/01/01 2006/01/08
 2  2 0.25         NA         NA
 3  3 0.25         NA         NA
 4  1  0.5 2006/01/03 2006/01/03
 5  2  0.5         NA         NA
 6  3  0.5         NA         NA
 7  1    2         NA         NA
 8  2    2 2006/01/15 2006/01/16
 9  3    2 2006/01/18 2006/01/22
10  1  2.5 2006/01/06 2006/01/07
11  2  2.5 2006/01/09 2006/01/14
12  3  2.5 2006/01/19 2006/01/20
13  1  3.5         NA         NA
14  2  3.5 2006/01/12 2006/01/12
15  3  3.5         NA         NA
16  1    4         NA         NA
17  2    4 2006/01/17 2006/01/17
18  3    4         NA         NA
> # ... convert STARTDATE back to timeDate if wanted,
> # and remove rows with NA STARTDATE/ENDDATE by subsetting


Stuyckens, Kim [PRDBE] wrote:
Dear S-News users,

I have a dataset which looks as follows containing thousands of rows:
ID      DOSE    DATE
_1       0.25    01/01/2006_
_1       0.25    01/02/2006_
1       0.50    01/03/2006
_1       0.25    01/04/2006_
_1       0.25    01/05/2006_
1       2.50    01/06/2006
1       2.50    01/07/2006
1       0.25    01/08/2006
2       2.50    01/09/2006
2       2.50    01/10/2006
2       2.50    01/11/2006
2       3.50    01/12/2006
2       2.50    01/13/2006
2       2.50    01/14/2006
2       2.00    01/15/2006
2       2.00    01/16/2006
2       4.00    01/17/2006
3       2.00    01/18/2006
3       2.50    01/19/2006
3       2.50    01/20/2006
3       2.00    01/21/2006
3       2.00    01/22/2006

a _data.frame(ID = c(1 ,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3),
DOSE= c(.25,.25,.5,0.25,0.25,2.5,2.5,0.25, 2.5,2.5,2.5,3.5,2.5,2.5,2,2,4,2,2.5,2.5,2,2), DATE= timeDate(as.character(seq.dates("01/01/2006", by="days", length=22))) )

My problem is that I would like to find the dosing time intervals that a certain ID(subject in this case) takes the same dose._ Mind that the same dose can return within the same ID at a later point in time._

The desired result looks as follows

ID      DOSE    STARTDATE       ENDDATE
_1       0.25    01/01/2006      01/02/2006_
1       0.50    01/03/2006      01/03/2006
_1       0.25    01/04/2006      01/05/2006_
1       2.50    01/06/2006      01/07/2006
1       0.25    01/08/2006      01/08/2006
2       2.50    01/09/2006      01/11/2006
2       3.50    01/12/2006      01/12/2006
2       2.50    01/13/2006      01/14/2006
2       2.00    01/15/2006      01/16/2006
2       4.00    01/17/2006      01/17/2006
3       2.00    01/18/2006      01/18/2006
3       2.50    01/19/2006      01/20/2006
3       2.00    01/21/2006      01/22/2006

I solved this problem until now with a loop to look for the moment when dose is changing, but I hope there is a more efficient way because the loop is taking to much time and gives memory problems with the amount of data I have.

Any input or suggestions are very much appreciated

BTW I am using SPLUS 6.2 in a Windows environment

Thanks in advance,
Kind regards,
Kim



<Prev in Thread] Current Thread [Next in Thread>