s-news
[Top] [All Lists]

Variable ranking in rpart

To: s-news@lists.biostat.wustl.edu
Subject: Variable ranking in rpart
From: "Eric Archer" <Eric.Archer@noaa.gov>
Date: Mon, 27 Sep 2004 12:24:24 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax)
List members,

I'm trying to rank variables in rpart a la pp. 146-150 in Breiman et al. (1984).  The closest that I can come is to sum the "improve = " value from the summary.rpart output for each variable over all nodes.  However, the Therneau et al. (1997) introduction document says that these values are, "n times the change in the impurty index". (page 20)  Is n the number of observations in each node, or the total number of observations?  I assume its the latter, but if its the former, then do I have to divide "improve" by n at each node prior to summing?

Also, what are the functional definitions of "agree=" and "adj=" in the Surrogate splits section of  summary.rpart?

Thanks much for the help!

cheers,
e.
-- 


Eric Archer, Ph.D.
NOAA-SWFSC
8604 La Jolla Shores Dr.
La Jolla, CA 92037
858-546-7121,7003(FAX)
eric.archer@noaa.gov


"We have fossils... We win!"
    - Lewis Black, on creationism

"Cogita tute" - Think for yourself

"Yea, though I walk in the valley of the shadow of
  death, I shall fear no evil, for I am the baddest
  mutha in the whole valley."
<Prev in Thread] Current Thread [Next in Thread>