List members,
I'm trying to rank variables in rpart a la pp. 146-150 in Breiman et
al. (1984). The closest that I can come is to sum the "improve = "
value from the summary.rpart output for each variable over all nodes.
However, the Therneau et al. (1997) introduction document says that
these values are, "n times the change in the impurty index".
(page 20) Is n the number of observations in each node, or the
total number of observations? I assume its the latter, but if its the
former, then do I have to divide "improve" by n at each node
prior to summing?
Also, what are the functional definitions of "agree=" and "adj=" in the
Surrogate splits section of summary.rpart?
Thanks much for the help!
cheers,
e.
--
Eric Archer, Ph.D.
NOAA-SWFSC
8604 La Jolla Shores Dr.
La Jolla, CA 92037
858-546-7121,7003(FAX)
eric.archer@noaa.gov
"We have fossils... We win!"
- Lewis Black, on creationism
"Cogita tute" - Think for yourself
"Yea, though I walk in the valley of the shadow of
death, I shall fear no evil, for I am the baddest
mutha in the whole valley."
|
|