jmp-l
[Top] [All Lists]

model validation - data scrambling versus hold out sets?

To: jmp-l@lists.biostat.wustl.edu
Subject: model validation - data scrambling versus hold out sets?
From: "James T Metz" <james.metz@abbott.com>
Date: Mon, 13 Jun 2005 18:22:34 -0500
Cc: "James T Metz" <james.metz@abbott.com>

JMP Users,

        I have a general question concerning model validation.  Does anyone have any thoughts or comments concerning
(Y data) (multiple) scrambling (using the column shuffle option in JMP) versus hold-out data sets (using excluded rows) as
a means to  "validate" models?  Is one method generally preferred over the other?  Is one method generally better for regression
while another method is better for partition models, etc?  Is the number of observations important?

        Case-in-point - I have a data set of about 15 observables (Y values).  I can obtain > 5000 X values (descriptors or columns)
for each of the rows.  Obviously, there is a great, and highly likely danger of chance correlation.  I could use either method mentioned
above to "validate" generated models.  However, my intuition says that the hold-out method is not appropriate in this case, since my
data set is so small.  Do others agree?

        I welcome thoughts, comments, literature references, etc.

        Regards,
        Jim Metz


James T. Metz, Ph.D.
Research Investigator Chemist

GPRD R46Y AP10-2
Abbott Laboratories
100 Abbott Park Road
Abbott Park, IL  60064-6100
U.S.A.

Office (847) 936 - 0441
FAX    (847) 935 - 0548

james.metz@abbott.com

This communication may contain information that is legally privileged, confidential, or exempt from disclosure.  If you are not the intended recipient, please note that any dissemination, distribution, use, or copying of this communication is strictly prohibited.  Anyone who receives this message in error should notify the sender immediately by telephone or return email and delete it from his or her computer.
<Prev in Thread] Current Thread [Next in Thread>