s-news
[Top] [All Lists]

Summary: Specifying a primary split on a regression tree

To: s-news@lists.biostat.wustl.edu
Subject: Summary: Specifying a primary split on a regression tree
From: "Wing, Michael" <Michael.Wing@orst.edu>
Date: Wed, 27 Feb 2002 07:35:42 -0800

My original post:

> I'm using S-Plus 2000 (Release 3) to create regression trees and am
> interested in investigating surrogate splits.  I would like to explicitly
> specify a primary split from my candidate variables and allow the software
> to create trees based on this primary split.  Any suggestions would be
> welcome.

I received one response, thank you Greg Snow.

Dr. Snow pointed out that I could use the edit.tree function to specify a primary split.  He also clarified that burl.tree provides graphical assistance in identifying other primary splits that may also partition the data set effectively.  Unfortunately for my needs, the user must also specify the data values to originate the split when using edit.tree.  I have nine trees and wish to consider seven primary splits for each tree (to satisfy a manuscript reviewer).  I am still searching for a method to force a primary split without having to identify a data value for the split.

I include the full response below:

This is assuming that you are using the built in tree functions.  There is
also the Rpart library which has some improvements, but I don't know it
well enough to say if or how to do the same things in Rpart.

For example lets assume you are using the dataset fuel.frame and have
created the initial tree using:

> fuel.tree <- tree( Mileage ~ Weight + Disp., data="">

you can examine the effects of different splits by using burl.tree()

> tree.screens()
> plot(fuel.tree,type="u")
> text(fuel.tree)
> fuel.tree.burl <- burl.tree(kyph.tree)

This puts you into an interactive mode with the graph.  Click on a split
in the graph and at the bottom of the screen you will see the effect of
all possible splits at that point.  The x-value of the graph indicates the
splitting value (e.g. Disp. < 134) and the height of the lines is how much
the deviance is reduced by that split.  This will give you a quick
graphical view of other splits that may also be good.  Click on other
splits to see the effect.  Right click when you want to quit the
interactive plot.  Now fuel.tree.burl contains the information from the
last split you clicked on in case you want to compare the actual numbers
rather than just the graphs.

Now that you have a different split that you want to try, you can use
edit.tree() to change the split and regrow the tree beyond that point.

> fuel.tree2 <- edit.tree(fuel.tree, node=1, var="Weight",
+ splitl = 2567.5)

Now, fuel.tree2 is forced to have the first split based on Weight < 2567.5
and follows the standard pattern below that.  You can edit other nodes
alse. 

Also look at identify.tree and browser.tree for other possible helps.

<Prev in Thread] Current Thread [Next in Thread>
  • Summary: Specifying a primary split on a regression tree, Wing, Michael <=