Hi all -
I am doing tree regression using S-Plus 7.0 and rpart. I came up with a model
and am trying to figure out what the rel error, xerror, and xstd rows stand for
when using the printcp command. Here is my output when using printcp:
> printcp(trpmilav.rp)
Regression tree:
rpart(formula = trpmilav ~ hhsize + kids18 + adults18 + hhtype +
numveh + numwrker + numdrver + income + poppersq + urban +
cbdg25 + cbdg100, data = HHTable.11.01Final, control =
rpart.control(minbucket = 30, cp = 0.001, xval = 10))
Variables actually used in tree construction:
[1] cbdg100 cbdg25 income numwrker poppersq
Root node error: 5.8745e6/4072 = 1442.6
n=4072 (271 observations deleted due to missing values)
CP nsplit rel error xerror xstd
1 0.0064231 0 1.00000 1.0006 0.40944
2 0.0035024 2 0.98715 1.0156 0.41187
3 0.0027420 3 0.98365 1.0193 0.41225
4 0.0017144 6 0.97543 1.0251 0.41228
5 0.0012832 12 0.96500 1.0331 0.41248
6 0.0010689 13 0.96371 1.0342 0.41245
7 0.0010000 14 0.96264 1.0354 0.41245
In all the examples in textbooks I've seen, the xerror column decreases as CP
increases - why does mine go up? And what's the best CP value to prune at?
Any advice would be greatly appreciated - thanks in advance!
-Stephanie
Stephanie Mather
Graduate Research Assistant
University of Connecticut
Dept of Civil & Enviro. Engineering
261 Glenbrook Rd, Unit 2037
Storrs, CT 06269-2037
|