Hi:
I have been trying to calculate the relative importance of the input variables
in a
regression/classification tree by summing the reduction in deviance that
results from
each split (internal node) in the tree for each of the input variables. (This
is suggested in the literature).
My plan was to get the sequence of all possible subtrees and their respective
deviances.
Furthermore I also need the ordered list of the variables that are involved in
each additional split that creates the next larger tree.
Annoyingly if I use
prune.tree(mytree)
I get an imcomplete sequence of trees, in other words
some of the subtrees are left out.
For example:
If I call the subtree sequence from the kyphosis data, starting with a
tree with 6 terminal nodes, the prune.tree command does not give me the
subtree of size 5 (see output below).
Is there a way that I can control the prune.tree or the previous tree command
so that it gives me all possible subtrees in the sequence?
I use S-Plus 6.2 for Windows.
> tree1 <- prune.tree(tree(Kyphosis ~ Age + Number + Start, data = kyphosis),
best = 6)
> treeseq <- prune.tree(tree1)
> treeseq
$size:
[1] 6 4 3 2 1
$dev:
[1] 47.87590 53.11331 57.25188 64.25730 83.23447
$k:
[1] -Inf 2.618706 4.138573 7.005418 18.977175
$method:
[1] "deviance"
attr(, "class"):
[1] "prune" "tree.sequence"
--
****************************************************************
Ernst Linder elinder@math.unh.edu
Department of Mathematics and Statistics 603 - 862- 2687
University of New Hampshire Fax: 603 - 862 - 4096
Durham, NH 03824 www.math.unh.edu/~elinder
****************************************************************
|