A few of suggestions:
1) Turn off the cross-validation and surrogate options (maxsurrogate=0, xval=0).
That should speed things up.
2) Try using a smaller subset just to make sure you've got the code working
(say n=500).
3) There is a stand-alone version of rpart out on statlib in the 'general'
section. This may work best for extremely large datasets.
~> I am trying to use rpart for a classification problem with 10,000
~> observations. The dependent variable has 11 levels and I have 4
~> predictors: 2 continuous and 2 categorical (with 5 and 4 levels
~> respectively). It runs fine if I take the defaults, but if I use a loss
~> matrix (with off-diagonal elements equal to the absolute value of the
~> misclassification error it just runs forever and never finishes).
~>
~> Am I being too ambitious here and the problem is just too big? (10,000
~> is 25% of the full dataset).
~>
~> Carlos Alzola
~> calzola@apa.com
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news
|