Tyler,
The first part is easy. You can use the by function to divide the data
set into clusters based on the first 3 columns. Within the function passed
to by, you must divide the group into smaller groups in the range in dates
is greater than 10. How you do that will greatly affect the results. Do you
split at the largest gap? Do you divide into equal sized groups? I would
add a column to the groups based on the first three columns and any
subsequent splits. You can reconstruct the data frame using
do.call("rbind", as.list(by ouput)).
Dave
"Brough, Tyler (FRS)"
<TBrough@russell.com> To:
"'s-news@lists.biostat.wustl.edu'" <s-news@lists.biostat.wustl.edu>
Sent by: cc:
s-news-owner@lists.biosta Subject: [S] Creating a
new factor from other factors and a date range
t.wustl.edu
02/17/2005 10:13 AM
Hello,
I am using S-PLUS 6.2 on Windows XP. I have a data.frame with 3 factors
and
a date.
The data.frame is sorted by the 3 factors and by date. I would like to
create a new factor
designating membership in a group. Each group is defined as having the
same
factor values
and dates that are less than some number (e.g. 10) days apart. Does anyone
have any
suggestions as to how to accomplish this task? I would like to be able to
possibly use more
than three factors and any other number of days.
The following replicates my data:
set.seed(153)
my.df <- data.frame(f1 <- as.factor(sample(1000:1010,size=100,replace=T)),
f2 <- as.factor(sample(c("A","B"),size=100,replace=T)),
f3 <- as.factor(sample(c(T,F),size=100,replace=T)),
td <-
sample(timeSeq(from="1/1/04",to="2/29/04",format="%Y%02m%02d"),size=100,repl
ace=T) )
my.df <- my.df[order(my.df[,1],my.df[,2],my.df[,3],my.df[,4]), ]
So, for example looking at the first 5 rows of my.df gives:
> my.df[1:5,]
X1 X2 X3 X4
40 1000 A FALSE 20040131
87 1000 A FALSE 20040217
99 1000 B FALSE 20040128
100 1000 B FALSE 20040203
49 1000 B FALSE 20040207
The first row would be assigned to group 1, row 2 to group 2, and the next
three rows to group 3 (using a 10
day date range).
Thank you in advance for your suggestions.
-Tyler
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
|