s-news
[Top] [All Lists]

tree() fails to discriminate

To: <s-news@wubios.wustl.edu>
Subject: tree() fails to discriminate
From: "D. Mckenzie" <dmck@u.washington.edu>
Date: Thu, 21 Jun 2001 13:13:29 -0700 (PDT)
I have rather noisy data: presence/absence of tree species as functions
of environmental variables.  Sample sizes range from 1500 to
4000.  tree() tends to overfit models, as has been observed by
several authors.  The error misclassification rate seems to go up almost
linearly (with my data) with reduction in the number of terminal nodes.
When pruning back to "sensible" sizes, I notice that some terminal splits
have the same value (1 or 0) at both nodes.  Using various criteria to
prune the tree, or with an unpruned tree, this happens.

Obviously (I think) a split is useless if it does not discriminate.  Is
this common feature of classification trees, or is it characteristic of a
certain kind of data, or is it another indicator of lack of fit?  Has
anyone tried hacking tree() or prune.tree() to circumscribe this?

Thanks in advance for any guidance.

_______________________________________________________________________

                           DON MCKENZIE

                        Research Ecologist
              College of Forest Resources, Box 352100
                      University of Washington
                        Seattle, WA 98195

                            206.543.2789
                        dmck@u.washington.edu

_______________________________________________________________________



<Prev in Thread] Current Thread [Next in Thread>