Hi Jim,
The Y-shuffle approach is really a permutation method. The problem is that
permuting the Y only also breaks up any correlation with the X's. So, you can
use this approach to test Y=f(x) vs Y=mu, but not Y=f1(x)+f2(x) vs Y=f1(x),
that is, the important case of comparison of a reduced model versus a full
model. The other approach using 'hold-out' data sets is like cross-validation.
You can do a k-fold cross validation where you hold out 1/kth of the data and
try your model and do that k times. If k = N you have a leave one out
approach.. Another approach you should consider is bootstrapping.
Look for books by Philip Good and ones by Fortunato Pesarin.
Cheers
Gunter
-----Original Message-----
From: jmp-l-owner@lists.biostat.wustl.edu
[mailto:jmp-l-owner@lists.biostat.wustl.edu]On Behalf Of James T Metz
Sent: Tuesday, 14 June 2005 9:23 AM
To: jmp-l@lists.biostat.wustl.edu
Cc: James T Metz
Subject: [jmp-l] model validation - data scrambling versus hold out sets?
JMP Users,
I have a general question concerning model validation. Does anyone
have any thoughts or comments concerning
(Y data) (multiple) scrambling (using the column shuffle option in JMP) versus
hold-out data sets (using excluded rows) as
a means to "validate" models? Is one method generally preferred over the
other? Is one method generally better for regression
while another method is better for partition models, etc? Is the number of
observations important?
Case-in-point - I have a data set of about 15 observables (Y values).
I can obtain > 5000 X values (descriptors or columns)
for each of the rows. Obviously, there is a great, and highly likely danger of
chance correlation. I could use either method mentioned
above to "validate" generated models. However, my intuition says that the
hold-out method is not appropriate in this case, since my
data set is so small. Do others agree?
I welcome thoughts, comments, literature references, etc.
Regards,
Jim Metz
James T. Metz, Ph.D.
Research Investigator Chemist
GPRD R46Y AP10-2
Abbott Laboratories
100 Abbott Park Road
Abbott Park, IL 60064-6100
U.S.A.
Office (847) 936 - 0441
FAX (847) 935 - 0548
james.metz@abbott.com
This communication may contain information that is legally privileged,
confidential, or exempt from disclosure. If you are not the intended
recipient, please note that any dissemination, distribution, use, or copying of
this communication is strictly prohibited. Anyone who receives this message in
error should notify the sender immediately by telephone or return email and
delete it from his or her computer.
****************************************************************************************************************
This email and any attached files are intended solely for the named addressee,
are confidential and may contain legally privileged information. The copying or
distribution of them or any information they contain, by anyone other than the
addressee, is prohibited. If you have received this email in error, please let
us know by telephone or return the email to the sender
and destroy all copies. Thank you.
CSL Limited A.C.N. 051 588 348
45 Poplar Road Parkville Victoria 3052 Australia
Phone: +61 3 9389 1911 Fax: +61 3 9389 1434
***************************************************************************************************************
|