Dear S+ users,
last week I posted a question concerning the v-fold crossvalidation
estimates for trees, after applying misclassification costs to the
classes. For my data set, the cv estimates got bigger for each model size,
since more cases of the lesser weighted class got misclassified in favor
of the heavier weigthed class. I asked for comments on how to evaluate the
goodness of the model and wether I could trust these estimates.
Many thanks to Frank Harrell, Peter Malewski, Charles Berry and Andrew
Sinclair, who gave friendly comments and advice.
One advice was, to pose this question to another news group:
'sci.stat.consult'.
Another good advice was, not to expect too much of my data. My
sample size of 1400 (which is rather big for psychotherapy data) is
apparently much too small to build a discriminating and well calibrated
model on.
Of course there is no doubt in the trustworthyness of the estimates, they
just don't capture exactly what I want.
Another advice was, to look at misclassification costs differentially,
to determine how many cases the model classifies correctly via a confusion
matrix.
Using the confusion matrix in combination with a crossvalidation on
another test sample will probably give the best estimates and make it
easier to decide on the correct model.
A final advice was, to look up this issue in Pattern Recognition
and Neural Networks by Prof. Ripley, where this topics is treated with the
necessary care.
Thank you all very much for the helpful comments.
wolfgang
*********************************************
Wolfgang Hannoever
Forschungsstelle fuer Psychotherapie
Christian-Belser-Str. 79a
70597 Stuttgart
Tel.: 0711 / 6781-407 Fax:0711 / 6876902
e-mail:hann@psyres-stuttgart.de
*********************************************
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news
|