s-news
[Top] [All Lists]

Re: tree model rpart - printcp output

To: Stephanie A Mather <STEPHANIE.MATHER@huskymail.uconn.edu>, s-news <s-news@lists.biostat.wustl.edu>
Subject: Re: tree model rpart - printcp output
From: Volker Bahn <lochapoka@web.de>
Date: Fri, 03 Nov 2006 10:25:16 -0500
In-reply-to: <1f70651fd2be.1fd2be1f7065@huskymail.uconn.edu>
References: <1f70651fd2be.1fd2be1f7065@huskymail.uconn.edu>
User-agent: Thunderbird 1.5.0.7 (Windows/20060909)
The increasing error is indeed a bad sign. Usually, the error, determined by cross-validation, decreases up to a certain number of splits (= model complexity) and then starts to increase again. One pruning strategy is to take the model with the lowest error (xerror) and add 1 std (xstd) to it and then go to the simplest model that has an error smaller than this sum. (1 standard error rule, Breiman 1984). In your case, however, every included variable only increases the error, indicating, that you do not have any variables in your set that can robustly explain variability in your dependent variable by the method of regression trees.
HTH

Volker

Stephanie A Mather wrote:
Hi all -

I am doing tree regression using S-Plus 7.0 and rpart.  I came up with a model 
and am trying to figure out what the rel error, xerror, and xstd rows stand for 
when using the printcp command.  Here is my output when using printcp:

printcp(trpmilav.rp)

Regression tree:
rpart(formula = trpmilav ~ hhsize + kids18 + adults18 + hhtype + numveh + numwrker + numdrver + income + poppersq + urban + cbdg25 + cbdg100, data = HHTable.11.01Final, control = rpart.control(minbucket = 30, cp = 0.001, xval = 10))

Variables actually used in tree construction:
[1] cbdg100  cbdg25   income   numwrker poppersq

Root node error: 5.8745e6/4072 = 1442.6

n=4072 (271 observations deleted due to missing values)

CP nsplit rel error xerror xstd 1 0.0064231 0 1.00000 1.0006 0.40944
2 0.0035024      2   0.98715 1.0156 0.41187
3 0.0027420      3   0.98365 1.0193 0.41225
4 0.0017144      6   0.97543 1.0251 0.41228
5 0.0012832     12   0.96500 1.0331 0.41248
6 0.0010689     13   0.96371 1.0342 0.41245
7 0.0010000     14   0.96264 1.0354 0.41245

In all the examples in textbooks I've seen, the xerror column decreases as CP 
increases - why does mine go up?  And what's the best CP value to prune at?

Any advice would be greatly appreciated - thanks in advance!

-Stephanie

Stephanie Mather
Graduate Research Assistant
University of Connecticut
Dept of Civil & Enviro. Engineering
261 Glenbrook Rd, Unit 2037
Storrs, CT 06269-2037





<Prev in Thread] Current Thread [Next in Thread>