s-news
[Top] [All Lists]

Re: rpart output to SAS

To: Terry Therneau <therneau@mayo.edu>
Subject: Re: rpart output to SAS
From: Frank E Harrell Jr <fharrell@virginia.edu>
Date: Wed, 05 Sep 2001 15:42:37 -0400
Cc: Jeff.Morrison@suntrust.com, Jeff@hsrnfs-101.mayo.edu, Morrison@hsrnfs-101.mayo.edu, SAS@hsrnfs-101.mayo.edu, a@hsrnfs-101.mayo.edu, create@hsrnfs-101.mayo.edu, from@hsrnfs-101.mayo.edu, how@hsrnfs-101.mayo.edu, know@hsrnfs-101.mayo.edu, program@hsrnfs-101.mayo.edu, rpart@hsrnfs-101.mayo.edu, s-news@wubios.wustl.edu, the@hsrnfs-101.mayo.edu, to@hsrnfs-101.mayo.edu, wanted@hsrnfs-101.mayo.edu
Organization: University of Virginia
References: <200109051841.NAA24342@rocky.mayo.edu>
In case anyone agrees, I have a method in the S Hmisc
library for making SAS code out of S functions (at
least some of them) called sascode.  If you wanted
to call the function sascode.rpart that would
fit in nicely with Hmisc.  -Frank Harrell

Terry Therneau wrote:
> 
> output, i.e., an Splus function that would write the program for him.
> There isn't such a function, but it was an interesting exercise to do so.
> (No, I didn't really have time for this, but it was an interesting puzzle
> over lunch hour.)
>    Below is a modified "print.rpart" function that does what he desires.
>         1. Some of the "if" statements could be replaced by "else" statements,
>         but that looks like too much work to figure out.
> 
>         2. The output is not correct for a categorical variable: it prints
>         out the SAS code  "if (Type=Large,Medium,Van)" for instance.  I'll
>         let someone else add that enhancement.
> 
> -----------------------------------------------------
> S-PLUS : Copyright (c) 1988, 2000 MathSoft, Inc.
> S : Copyright Lucent Technologies, Inc.
> Version 6.0 Release 1 for Sun SPARC, SunOS 5.6 : 2000
> Working data will be in .Data
> > fit
> n= 228
> 
> node), split, n, loss, yval, (yprob)
>       * denotes terminal node
> 
>  1) root 228 63 2 (0.2763158 0.7236842)
>    2) time>=171.5 169 61 2 (0.3609467 0.6390533)
>      4) age< 45.5 10  2 1 (0.8000000 0.2000000) *
>      5) age>=45.5 159 53 2 (0.3333333 0.6666667)
>       10) time< 304 76 32 2 (0.4210526 0.5789474)
>         20) ph.ecog< 1.5 62 30 2 (0.4838710 0.5161290)
>           40) age< 69.5 48 22 1 (0.5416667 0.4583333)
>             80) time>=219.5 29 10 1 (0.6551724 0.3448276) *
>             81) time< 219.5 19  7 2 (0.3684211 0.6315789) *
>           41) age>=69.5 14  4 2 (0.2857143 0.7142857) *
>         21) ph.ecog>=1.5 14  2 2 (0.1428571 0.8571429) *
>       11) time>=304 83 21 2 (0.2530120 0.7469880)
>         22) time>=802.5 7  2 1 (0.7142857 0.2857143) *
>         23) time< 802.5 76 16 2 (0.2105263 0.7894737) *
>    3) time< 171.5 59  2 2 (0.0338983 0.9661017) *
> 
> > rparttosas(fit)
>    if (time>=171.5) then do;
>       if (age< 45.5) then rpnode= 4 ;
>       if (age>=45.5) then do;
>          if (time< 304) then do;
>             if (ph.ecog< 1.5) then do;
>                if (age< 69.5) then do;
>                   if (time>=219.5) then rpnode= 80 ;
>                   if (time< 219.5) then rpnode= 81 ;
>                   end;
>                if (age>=69.5) then rpnode= 41 ;
>                end;
>             if (ph.ecog>=1.5) then rpnode= 21 ;
>             end;
>          if (time>=304) then do;
>             if (time>=802.5) then rpnode= 22 ;
>             if (time< 802.5) then rpnode= 23 ;
>             end; end; end;
>    if (time< 171.5) then rpnode= 3 ;
> 
> ---------------------
> 
>   The idea is to create variable rpnode, which has the value of the starred
> lines in the printout.
> 
>         Terry Therneau
> 
> -------------------------------------------------
> #
> # Here is a special program written at the behest of a user.
> #  He wants a SAS program that will create the predicted node number
> #  for each subject.
> # The code is 99% a copy of print.rpart.
> rparttosas <- function(x, minlength=0, spaces=3, cp,
>                digits=.Options$digits, ...) {
>     if(!inherits(x, "rpart")) stop("Not legitimate rpart object")
>     if (!is.null(x$frame$splits)) x <- rpconvert(x)  #help for old objects
> 
>     if (!missing(cp)) x <- prune.rpart(x, cp=cp)
>     frame <- x$frame
>     node <- as.numeric(row.names(frame))
>     depth <- tree.depth(node)
>     indent <- paste(rep(" ", spaces * 32), collapse = "")
>     #32 is the maximal depth
>     if(length(node) > 1) {
>         indent <- substring(indent, 1, spaces * 1:max(depth))
>         indent <- c("", indent[depth])
>         }
>     else stop("Tree has only 1 node")
> 
>     # this is the ending part of each line
>     term <- ifelse(frame$var == "<leaf>", paste("then rpnode=", node, ";"),
>                                                 "then do;")
>     # the first part of the line
>     z <- labels(x, digits=digits, minlength=minlength, ...)
>     z <- paste(indent, "if (", z, ") ", sep='')
> 
>     # add in the "end" statements that go with each "do"
>     delta  <- -diff(c(depth,1))   #leftward movement of the indent
>     temp <- paste(rep("end;", max(delta)), collapse=' ')
>     endlist <- substring(temp, 1, 5*delta)
>     endstring <- ifelse(delta>0, paste("\n", indent, endlist, sep=''), "")
>     z <- paste(z, term, endstring, sep='')
> 
>     cat(z[-1], sep = "\n")  # the -1 prevents listing the root node
>     return(invisible(x))
>     }
> 
> ---------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news

-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

<Prev in Thread] Current Thread [Next in Thread>