s-news
[Top] [All Lists]

Re: More GOF questions...

To: "Kim Elmore" <Kim.Elmore@noaa.gov>
Subject: Re: More GOF questions...
From: "Brian S Cade" <brian_cade@usgs.gov>
Date: Wed, 10 Nov 2004 08:32:57 -0700
Cc: Nels Tomlinson <nels_tomlinson@labor.state.ak.us>, S-News <s-news@wubios.wustl.edu>, s-news-owner@lists.biostat.wustl.edu
Kim:  Just to clarify one approach I suggested:  I was intending that the
data would be dithered by adding small random uniform number e.g. [-0.5,
0.5] to break the discreteness of the data prior to computing the KS gof
statistic.  After getting this statistic on the dithered data, you then
compute the P-value for the KS-gof statistic using conventional
distributional approximations or resampling approach.

Brian

Brian S. Cade

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  brian_cade@usgs.gov
tel:  970 226-9326


                                                                                
                                   
                      "Kim Elmore"                                              
                                   
                      <Kim.Elmore@noaa.gov>            To:       Nels Tomlinson 
                                   
                      Sent by:                          
<nels_tomlinson@labor.state.ak.us>, S-News                 
                      s-news-owner@lists.biosta         
<s-news@wubios.wustl.edu>                                  
                      t.wustl.edu                      cc:                      
                                   
                                                       Subject:  Re: [S] More 
GOF questions...                     
                                                                                
                                   
                      11/09/2004 03:41 PM                                       
                                   
                                                                                
                                   
                                                                                
                                   




Hi Nels,

Thanks for the reply.  This is intrigueing, but I'm wondering: if the
continuous KS test can't "properly" be applied to discrete data, then what
you propose sounds something like: "improperly" apply the KS test in a
Monte Carlo sense, and look at the distribution of the resulting
"inappropriate" p-values in light of the original "inappropriate" p-value.
Have I got it?  Is that kosher?

I see that this has a certain appeal, but in this case the p-value isn't
really a p-value in the classical sense, but simply a statistic, right?

I tried the idea floated by Brian Cade, and got interesting results, but I
have to wonder if it is appropriate, too. I did a B=1000 bootstrap of the
KS p-values for my original continuous data. I then generated 1000 Monte
Carlo samples by adding uniform [-0.5, 0.5] "noise" to the same discrete
data set (generated by rounding to zero decimal places the continuous
data). I got very similar mean p-values, but *very* different distributions

of the bootstrapped and discrete Monte Carlo p-values. The bootstrapped KS
p-value distribution looks like a beta distribution with p < 1 and q > 2
(roughly) while the Monte Carlo/discrete data produce a normal-looking
distribution, which I expected.

I have not yet tried Sam Buttrey's suggestion because I'm wary of
fabricating my own test statistic for a peer-reviewed paper in meteorology;

doing so may  unhinge a reviewer. The Cramer-von Mises test will probably
be new to most, but I plan to introduce it notwithstanding (thanks very
much to Richard Lockhart). Meteorologists are familiar with Monte Carlo
processes, and so are likely to accept Monte Carlo approaches with standard

statistics, such as the KS test.

So far, I'm learning a lot!

Kim Elmore

At 12:37 PM 11/9/2004, you wrote:
>What I did ( several years ago, so I'm pretty sure I'm remembering right)
>was to generate 10,000 data sets under the null hypothesis, compute the KS
>statistic for each, order them, and take the 250th and the 9,750th as the
>95% confidence interval.  Obviously, if the KS for your actual data falls
>outside that range, you can reject the null with 95% confidence.
>
>What made it interesting is that it's only recently that analysis has
>advanced enough to enable mathematicians to prove that it worked, even
>though it seemed obvious to me that it must converge.
>
>As for Brian's reasonable suggestion about avoiding loops: he's right;
don't
>loop!
>
>I avoided looping by filling a 10,000 by N array with the random data,
then
>apply()-ing the KS function to the 10,000 columns.  It was fast enough,
>several years ago, and probably faster today.
>
>Nels
>
>
>-----Original Message-----
>From: s-news-owner@lists.biostat.wustl.edu
>[mailto:s-news-owner@lists.biostat.wustl.edu]On Behalf Of Kim Elmore
>Sent: Thursday, November 04, 2004 12:07 PM
>To: S-News
>Subject: [S] More GOF questions...
>
>
>Now that I understand my error when looking at chi-square GOF results
using
>my dummy data, I tried the same thing with a KS GOF test. My benighted
>state continues. Here is what I see:
>
>  > set.seed(981)
>  > test.cont.dat <- runif(540, min=0.5, max=16.5)
>  > test.disc.dat <- round(test.cont.dat, 0)
>  > ks.gof(test.cont.dat, dist="unif", min=0.5, max=16.5)$p.value
>[1] 0.5324786
>  > ks.gof(test.disc.dat, dist="unif", min=1, max=16)$p.value
>[1] 0.003955284
># and just in case...
>  > ks.gof(test.disc.dat, dist="unif", min=0.5, max=16.5)$p.value
>[1] 0.06938681
>
>I know that the test statistic is quite different between the chi-square
>and the KS tests, but there is clearly something I have missed about the
KS
>test.  I do not necessarily expect the same p-values for continuous and
>discrete data, but why are results from the discrete data so vastly
>different? Is (or should) the KS test limited to only continuous data?
>
>Kim Elmore
>                            Kim Elmore, Ph.D.
>                         University of Oklahoma
>          Cooperative Institute for Mesoscale Meteorological Studies
>"All of weather is divided into three parts: Yes, No, and Maybe. The
>greatest of these is Maybe" The original Latin appears to be garbled.
>
>--------------------------------------------------------------------
>This message was distributed by s-news@lists.biostat.wustl.edu.  To
>unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
>the BODY of the message:  unsubscribe s-news

                           Kim Elmore, Ph.D.
                        University of Oklahoma
         Cooperative Institute for Mesoscale Meteorological Studies
"All of weather is divided into three parts: Yes, No, and Maybe. The
greatest of these is Maybe" The original Latin appears to be garbled.

--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news






<Prev in Thread] Current Thread [Next in Thread>