s-news
[Top] [All Lists]

Re: Q: How can I get a complete sequence of sub-trees and

To: "'Ernst Linder'" <elinder@cisunix.unh.edu>, s-news@lists.biostat.wustl.edu
Subject: Re: Q: How can I get a complete sequence of sub-trees and
From: "Liaw, Andy" <andy_liaw@merck.com>
Date: Tue, 24 Feb 2004 14:30:22 -0500
Ernst,

I don't have any solution for your problem directly, but (worse), I'd like
to warn you against such importance measure for some data sets.  This type
of measures can be very misleading if the predictor variables used in the
tree have very different number of possible splits.  (This description might
sound strange, but that's the fact.)  For a numerical variable, the number
of possible split is the number of unique values minus 1.  For a categorical
variable, the number of possible splits is the number of ways to divide the
categories into two non-empty groups.

The gist of the problem is this: most tree induction algorithms have the
problem of `preferring' to split on variables with larger number of possible
splits.  Given two predictors with same correlation with the response, tree
will tend to split on the one with more possible splits.  The net effect is
that the measure you mention will make variables with larger number of
possible splits look more important than they really are.

(Some authors have tried to modify splitting procedures to `correct the
bias', but they all have their own problems...)

Best,
Andy

> From: Ernst Linder
> 
> 
> 
> Hi:
> 
> I have been trying to calculate the relative importance of 
> the input variables in a
> regression/classification tree by summing the reduction in 
> deviance that results from
> each split (internal node) in the tree for each of the input 
> variables.  (This
> is suggested in the literature).
> 
> My plan was to get the sequence of all possible subtrees and 
> their respective deviances.
> Furthermore I also need the ordered list of the variables 
> that are involved in
> each additional split that creates the next larger tree.
> 
> Annoyingly if I use
>      prune.tree(mytree)
> I get an imcomplete sequence of trees, in other words
> some of the subtrees are left out.
> For example:
> If I call the subtree sequence from the kyphosis data, starting with a
> tree with 6 terminal nodes,  the prune.tree command does not 
> give me the
> subtree of size 5 (see output below).
> 
> Is there a way that I can control the prune.tree or the 
> previous tree command
> so that it gives me all possible subtrees in the sequence?
> 
> I use S-Plus 6.2 for Windows.
> 
> 
> 
>   > tree1 <- prune.tree(tree(Kyphosis ~ Age + Number + Start, 
> data = kyphosis), best = 6)
>   > treeseq <- prune.tree(tree1)
>   > treeseq
> $size:
> [1] 6 4 3 2 1
> 
> $dev:
> [1] 47.87590 53.11331 57.25188 64.25730 83.23447
> 
> $k:
> [1]      -Inf  2.618706  4.138573  7.005418 18.977175
> 
> $method:
> [1] "deviance"
> 
> attr(, "class"):
> [1] "prune"         "tree.sequence"
> 
> 
> -- 
> ****************************************************************
> Ernst Linder                        elinder@math.unh.edu
> Department of Mathematics and Statistics     603 - 862- 2687
> University of New Hampshire            Fax:  603 - 862 - 4096
> Durham, NH 03824                www.math.unh.edu/~elinder
> ****************************************************************
> 
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
> 
> 


------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New
Jersey, USA 08889), and/or its affiliates (which may be known outside the
United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan as
Banyu) that may be confidential, proprietary copyrighted and/or legally
privileged. It is intended solely for the use of the individual or entity
named on this message.  If you are not the intended recipient, and have
received this message in error, please notify us immediately by reply e-mail
and then delete it from your system.
------------------------------------------------------------------------------

<Prev in Thread] Current Thread [Next in Thread>
  • Re: Q: How can I get a complete sequence of sub-trees and, Liaw, Andy <=