Gregor Gorjanc wrote:
Frank E Harrell Jr <f.harrell <at> vanderbilt.edu> writes:
Bikash Jain wrote:
Hi,
I'm using Environmental Stats module to fit a best distribution
to a single column of data using KS method. Based on the
p.value I get the
[...]
Any method that uses the empirical CDF or Kaplan-Meier estimator as
the standard will inherit the imprecision of the ECDF, making one
wonder why fitting a curve will help. P-values from this approach
will be too low. Model uncertainty is being covered up.
All fine comments, but what can we actually use instead?
If you don't know which distribution family to fit, you might as well go
nonparametric. Alternatively, use a rich enough family, estimate all
the parameters, and pay for the variance due to these parameters. As
Greenland said (Biometrics 2000)
models need to be complex to capture uncertainty about the
relations...an honest
uncertainty assessment requires parameters for all effects that we
know may be present. This advice is implicit in an antiparsimony
principle often attributed to L. J. Savage 'All models should be as
big as an elephant (see Draper, 1995)
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
|