s-news
[Top] [All Lists]

Re: tree() fails to discriminate: summary

To: s-news group <s-news@wubios.wustl.edu>
Subject: Re: tree() fails to discriminate: summary
From: "D. Mckenzie" <dmck@u.washington.edu>
Date: Fri, 22 Jun 2001 08:45:34 -0700 (PDT)
In-reply-to: <Pine.GSO.4.31.0106220712010.15487-100000@auk.stats>
Thanks to Greg Arnold, Nick Ellis, Sam Buttrey, and Brian Ripley for
cogent responses to my question about classification trees failing to
discriminate between classes 1 and 0.  The deviance criterion, which is
the optimal way to build trees, causes splits with big reductions in
deviance to sometimes cause the same class to have a plurality in each
leaf.  Pruning by misclassification rate is the solution.  Discussions are
in Breiman et al. 1984 and V&R 3rd edition.

My original question follows with key excerpts from the responses of my
benefactors:


On Thu, 21 Jun 2001, D. Mckenzie wrote:

> I have rather noisy data: presence/absence of tree species as functions
> of environmental variables.  Sample sizes range from 1500 to
> 4000.  tree() tends to overfit models, as has been observed by
> several authors.  The error misclassification rate seems to go up almost
> linearly (with my data) with reduction in the number of terminal nodes.
> When pruning back to "sensible" sizes, I notice that some terminal
> splits have the same value (1 or 0) at both nodes.  Using various
> criteria to prune the tree, or with an unpruned tree, this happens.
>
> Obviously (I think) a split is useless if it does not discriminate.  Is
> this common feature of classification trees, or is it characteristic of
> certain kind of data, or is it another indicator of lack of fit?  Has
> anyone tried hacking tree() or prune.tree() to circumscribe this?

from Greg Arnold

The different nodes may have the same classification, but the
uncertainty related to that classification will be different.  You
will probably find that in one node the error rate is very small, and
in the other quite large.  The tree minimises the deviance, which is
a measure of the uniformity in each node.  The deviance drop when a
muddled node is split into a pure node and another muddled node; that
is what you are observing.

from Nick Ellis

A classification tree provides probabilities of each
class at a node. The 'classification' of a particular node is simply the
class with the maximum probability. Based on this, a split can appear not
to discriminate, but is nevertheless a useful split if you look at the
probabilities.

For example, suppose a node has 1050 in class A and 49 in class B. After
splitting suppose the counts are (A,B)=(1000,0) for the left daughter node
and (50,49) for the right. The classification is still A for both daughter
nodes, but the split is nevertheless a very useful one.

from Sam Buttrey

sometimes a split will produce a big reduction in deviance even though the
same class ends up with a plurality in each leaf. For example, a split
might produce one leaf that's 100% class A and 0% class B, and another
leaf that's 51% class A and 49% class B. They're both "class A" leaves
but the split may have reduced the deviance by "a lot."

You can get around this at cross-validation and pruning time by specifying
prune.misclass instead of prune.tree.

So why not just build the tree so as to reduce misclassification rate,
rather than deviance? Breiman et al, in the CART book, give a good example
of why that doesn't work out.

from Brian Ripley

If you prune on misclassification rate (as recommended by most people)
this does not happen.  The split is not useless in that the probability
predictions differ, but it not effective for just predicting a class
with the default 0-1 loss structure.

There's an example and comment in V&R3, specifically on page 327.



<Prev in Thread] Current Thread [Next in Thread>