jmp-l
[Top] [All Lists]

Re: model validation - data scrambling versus hold out sets?

To: jmp-l@lists.biostat.wustl.edu
Subject: Re: model validation - data scrambling versus hold out sets?
From: "David Ikle" <david.ikle@wilm.ppdi.com>
Date: Mon, 13 Jun 2005 21:40:23 -0400
Cc: James T Metz <james.metz@abbott.com>
Organization: PPD
References: <OF8001E185.8AA3F7A0-ON8625701F.007F2BC1@northamerica.intra.abbott.com>
My first question is what kind of a problem generates >5000 Xs
on each of 15 observations?  David

James T Metz wrote:
> 
> JMP Users,
> 
>         I have a general question concerning model validation.  Does
> anyone have any thoughts or comments concerning
> (Y data) (multiple) scrambling (using the column shuffle option in
> JMP) versus hold-out data sets (using excluded rows) as
> a means to  "validate" models?  Is one method generally preferred over
> the other?  Is one method generally better for regression
> while another method is better for partition models, etc?  Is the
> number of observations important?
> 
>         Case-in-point - I have a data set of about 15 observables (Y
> values).  I can obtain > 5000 X values (descriptors or columns)
> for each of the rows.  Obviously, there is a great, and highly likely
> danger of chance correlation.  I could use either method mentioned
> above to "validate" generated models.  However, my intuition says that
> the hold-out method is not appropriate in this case, since my
> data set is so small.  Do others agree?
> 
>         I welcome thoughts, comments, literature references, etc.
> 
>         Regards,
>         Jim Metz
> 
> James T. Metz, Ph.D.
> Research Investigator Chemist
> 
> GPRD R46Y AP10-2
> Abbott Laboratories
> 100 Abbott Park Road
> Abbott Park, IL  60064-6100
> U.S.A.
> 
> Office (847) 936 - 0441
> FAX    (847) 935 - 0548
> 
> james.metz@abbott.com
> 
> This communication may contain information that is legally privileged,
> confidential, or exempt from disclosure.  If you are not the intended
> recipient, please note that any dissemination, distribution, use, or
> copying of this communication is strictly prohibited.  Anyone who
> receives this message in error should notify the sender immediately by
> telephone or return email and delete it from his or her computer.
______________________________________________________________________
This email transmission and any documents, files or previous email 
messages attached to it may contain information that is confidential or 
legally privileged. If you are not the intended recipient or a person 
responsible for delivering this transmission to the intended recipient, 
you are hereby notified that you must not read this transmission and 
that any disclosure, copying, printing, distribution or use of this 
transmission is strictly prohibited. If you have received this 
transmission in error, please immediately notify the sender by telephone 
or return email and delete the original transmission and its attachments 
without reading or saving in any manner.


<Prev in Thread] Current Thread [Next in Thread>