s-news
[Top] [All Lists]

[S] Tree mystery explained!

To: "'s-news@wubios.wustl.edu'" <s-news@wubios.wustl.edu>
Subject: [S] Tree mystery explained!
From: "Buttrey, Samuel" <sebuttre@monterey.nps.navy.mil>
Date: Fri, 23 Jul 1999 09:47:25 -0700
Cc: "Kobayashi, Izumi" <ikobaya@nps.navy.mil>, "Koyak, Robert" <rakoyak@monterey.nps.navy.mil>, "Whitaker, Lyn" <LWhitaker@monterey.nps.navy.mil>, "Read, Robert" <RRead@monterey.nps.navy.mil>
Sender: owner-s-news@wubios.wustl.edu
Hi. I posted a question to this group a while ago, and since I've now
figured it out to my satisfaction, I thought I'd share what I've learned in
case anyone is interested. (The short answer is that if a number in a split
label needs to contain more than six digits, it'll help you to round or rank
your x's.)

My question concerned running tree() and getting inconsistent results,
inconsistent in the sense that the "n" column of the "frame component didn't
match up with the "where" component. This meant that prune.tree(), for
example, was confused and failed. Here is a very simple example. Consider
this six-row data frame.

> test <- data.frame (y = as.factor (c(0, 0, 1, 1, 1, 1)), x = c(0, 1,
1.00001, 1.00001, 2, 2))

The ideal tree would split x between 1 and 1.00001, perhaps at the
mid-point, 1.000005. Then both y's would go to the left and all four x's
would go to the right. What do you get when you run the tree?

> test.tree <- tree (y ~ x, data = test, minsize = 2)
> test.tree$frame
     var n     dev yval splits.cutleft splits.cutright   yprob.0   yprob.1 
1      x 6 7.63817    1             <1              >1 0.3333333 0.6666667
2 <leaf> 2 0.00000    0                                1.0000000 0.0000000
3 <leaf> 4 0.00000    1                                0.0000000 1.0000000

The split is at "x < 1," and rows 2 and 3 frame says there are two items
with x < 1 and four with x > 1. However you and I know this isn't quite
right. The "where" item disagrees with the frame:

> table (test.tree$where)
 2 3 
 1 5

The problem seems to be in the representation of the strings in the "splits"
matrix. (Recall that the "splits" element of the "frame" is in fact a
matrix.) These strings have some limit on the number of characters they can
carry (and this doesn't appear to be affected by the options()$digits
setting). This "broken" tree cannot be pruned:
> prune.tree (test.tree, best = 1)
Error in .C("VR_prune2",: subroutine VR_prune2: 1 Inf value(s) in argument 5
Dumped

Try the same operation with this data frame, where the third and fourth x's
have one fewer zero:

> test <- data.frame (y = as.factor (c(0, 0, 1, 1, 1, 1)), x = c(0, 1,
1.0001, 1.0001, 2, 2))
> test.tree <- tree (y ~ x, data = test, minsize = 2)
> test.tree$frame
     var n     dev yval splits.cutleft splits.cutright   yprob.0   yprob.1 
1      x 6 7.63817    1       <1.00005        >1.00005 0.3333333 0.6666667
2 <leaf> 2 0.00000    0                                1.0000000 0.0000000
3 <leaf> 4 0.00000    1                                0.0000000 1.0000000

Note correct location of the split, and the number of digits in the text of
the "splits." Here the frame agrees with table():

> table (test.tree$where)
 2 3 
 2 4

and this tree can be pruned.

> prune.tree (test.tree, best = 1)
node), split, n, deviance, yval, (yprob)
      * denotes terminal node

1) root 6 7.638 1 ( 0.3333 0.6667 ) *

So the moral of the story seems to be this: if your x's are too close
together where a split needs to be made, watch out. My colleague Bob Koyak
suggests rounding the x's before running tree(); since this transformation
preserves the ordering of the x's, the tree will be unaffected. (Of course
you'll want to keep track of the original x's for purposes of
interpretation.)

Thanks for listening, and have fun,
Sam ("Joyce Kilmer") Buttrey
buttrey@nps.navy.mil
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>
  • [S] Tree mystery explained!, Buttrey, Samuel <=