s-news
[Top] [All Lists]

rpart output to SAS

To: Jeff.Morrison@suntrust.com
Subject: rpart output to SAS
From: Terry Therneau <therneau@mayo.edu>
Date: Wed, 5 Sep 2001 13:41:49 -0500 (CDT)
Cc: Jeff@hsrnfs-101.mayo.edu, Morrison@hsrnfs-101.mayo.edu, SAS@hsrnfs-101.mayo.edu, a@hsrnfs-101.mayo.edu, create@hsrnfs-101.mayo.edu, from@hsrnfs-101.mayo.edu, how@hsrnfs-101.mayo.edu, know@hsrnfs-101.mayo.edu, program@hsrnfs-101.mayo.edu, rpart@hsrnfs-101.mayo.edu, s-news@wubios.wustl.edu, the@hsrnfs-101.mayo.edu, to@hsrnfs-101.mayo.edu, wanted@hsrnfs-101.mayo.edu
output, i.e., an Splus function that would write the program for him.
There isn't such a function, but it was an interesting exercise to do so.
(No, I didn't really have time for this, but it was an interesting puzzle 
over lunch hour.)  
   Below is a modified "print.rpart" function that does what he desires.
        1. Some of the "if" statements could be replaced by "else" statements,
        but that looks like too much work to figure out.

        2. The output is not correct for a categorical variable: it prints
        out the SAS code  "if (Type=Large,Medium,Van)" for instance.  I'll
        let someone else add that enhancement.

-----------------------------------------------------
S-PLUS : Copyright (c) 1988, 2000 MathSoft, Inc.
S : Copyright Lucent Technologies, Inc.
Version 6.0 Release 1 for Sun SPARC, SunOS 5.6 : 2000 
Working data will be in .Data 
> fit
n= 228 

node), split, n, loss, yval, (yprob)
      * denotes terminal node

 1) root 228 63 2 (0.2763158 0.7236842)  
   2) time>=171.5 169 61 2 (0.3609467 0.6390533)  
     4) age< 45.5 10  2 1 (0.8000000 0.2000000) *
     5) age>=45.5 159 53 2 (0.3333333 0.6666667)  
      10) time< 304 76 32 2 (0.4210526 0.5789474)  
        20) ph.ecog< 1.5 62 30 2 (0.4838710 0.5161290)  
          40) age< 69.5 48 22 1 (0.5416667 0.4583333)  
            80) time>=219.5 29 10 1 (0.6551724 0.3448276) *
            81) time< 219.5 19  7 2 (0.3684211 0.6315789) *
          41) age>=69.5 14  4 2 (0.2857143 0.7142857) *
        21) ph.ecog>=1.5 14  2 2 (0.1428571 0.8571429) *
      11) time>=304 83 21 2 (0.2530120 0.7469880)  
        22) time>=802.5 7  2 1 (0.7142857 0.2857143) *
        23) time< 802.5 76 16 2 (0.2105263 0.7894737) *
   3) time< 171.5 59  2 2 (0.0338983 0.9661017) *

> rparttosas(fit)
   if (time>=171.5) then do;
      if (age< 45.5) then rpnode= 4 ;
      if (age>=45.5) then do;
         if (time< 304) then do;
            if (ph.ecog< 1.5) then do;
               if (age< 69.5) then do;
                  if (time>=219.5) then rpnode= 80 ;
                  if (time< 219.5) then rpnode= 81 ;
                  end; 
               if (age>=69.5) then rpnode= 41 ;
               end; 
            if (ph.ecog>=1.5) then rpnode= 21 ;
            end; 
         if (time>=304) then do;
            if (time>=802.5) then rpnode= 22 ;
            if (time< 802.5) then rpnode= 23 ;
            end; end; end;
   if (time< 171.5) then rpnode= 3 ;

---------------------

  The idea is to create variable rpnode, which has the value of the starred
lines in the printout.

        Terry Therneau

-------------------------------------------------
#
# Here is a special program written at the behest of a user.
#  He wants a SAS program that will create the predicted node number
#  for each subject.
# The code is 99% a copy of print.rpart.
rparttosas <- function(x, minlength=0, spaces=3, cp, 
               digits=.Options$digits, ...) {
    if(!inherits(x, "rpart")) stop("Not legitimate rpart object")
    if (!is.null(x$frame$splits)) x <- rpconvert(x)  #help for old objects

    if (!missing(cp)) x <- prune.rpart(x, cp=cp)
    frame <- x$frame
    node <- as.numeric(row.names(frame))
    depth <- tree.depth(node) 
    indent <- paste(rep(" ", spaces * 32), collapse = "")   
    #32 is the maximal depth
    if(length(node) > 1) {
        indent <- substring(indent, 1, spaces * 1:max(depth))
        indent <- c("", indent[depth])
        }
    else stop("Tree has only 1 node")
    
    # this is the ending part of each line
    term <- ifelse(frame$var == "<leaf>", paste("then rpnode=", node, ";"), 
                                                "then do;")
    # the first part of the line 
    z <- labels(x, digits=digits, minlength=minlength, ...)
    z <- paste(indent, "if (", z, ") ", sep='')

    # add in the "end" statements that go with each "do"
    delta  <- -diff(c(depth,1))   #leftward movement of the indent
    temp <- paste(rep("end;", max(delta)), collapse=' ')
    endlist <- substring(temp, 1, 5*delta)
    endstring <- ifelse(delta>0, paste("\n", indent, endlist, sep=''), "")
    z <- paste(z, term, endstring, sep='')

    cat(z[-1], sep = "\n")  # the -1 prevents listing the root node
    return(invisible(x))
    }


<Prev in Thread] Current Thread [Next in Thread>