s-news
[Top] [All Lists]

tree model rpart - printcp output

To: s-news@lists.biostat.wustl.edu
Subject: tree model rpart - printcp output
From: Stephanie A Mather <STEPHANIE.MATHER@huskymail.uconn.edu>
Date: Thu, 02 Nov 2006 10:44:00 -0500
Hi all -

I am doing tree regression using S-Plus 7.0 and rpart.  I came up with a model 
and am trying to figure out what the rel error, xerror, and xstd rows stand for 
when using the printcp command.  Here is my output when using printcp:

> printcp(trpmilav.rp)

Regression tree:
rpart(formula = trpmilav ~ hhsize + kids18 + adults18 + hhtype + 
        numveh + numwrker + numdrver + income + poppersq + urban +
        cbdg25 + cbdg100, data = HHTable.11.01Final, control = 
        rpart.control(minbucket = 30, cp = 0.001, xval = 10))

Variables actually used in tree construction:
[1] cbdg100  cbdg25   income   numwrker poppersq

Root node error: 5.8745e6/4072 = 1442.6

n=4072 (271 observations deleted due to missing values)

         CP nsplit rel error xerror    xstd 
1 0.0064231      0   1.00000 1.0006 0.40944
2 0.0035024      2   0.98715 1.0156 0.41187
3 0.0027420      3   0.98365 1.0193 0.41225
4 0.0017144      6   0.97543 1.0251 0.41228
5 0.0012832     12   0.96500 1.0331 0.41248
6 0.0010689     13   0.96371 1.0342 0.41245
7 0.0010000     14   0.96264 1.0354 0.41245

In all the examples in textbooks I've seen, the xerror column decreases as CP 
increases - why does mine go up?  And what's the best CP value to prune at?

Any advice would be greatly appreciated - thanks in advance!

-Stephanie

Stephanie Mather
Graduate Research Assistant
University of Connecticut
Dept of Civil & Enviro. Engineering
261 Glenbrook Rd, Unit 2037
Storrs, CT 06269-2037


<Prev in Thread] Current Thread [Next in Thread>