On Fri, 4 Aug 2006, Wim Kimmerer wrote:
> Hello S-newsers. I am trying to fit some data to a negative binomial
> distribution: I have ~70 sets of count data and want to fit each set
> separately, mainly to determine the extent to which the zeros are in excess of
> expectations, but also to assure myself that the NB is the correct
> distribution to describe these data, which should be distributed as an
> overdispersed Poisson (with the possible exception of extra zeros).
>
> (Note re NB: There are two ways of formulating it, one appropriate to
> Bernoulli trials and the other to continuous distributions. The parameter
> names used in the various formulations are different, and I have not found a
> place where these are explained very well. For example, rnbinom (Splus) has r
> and p, and rnegbin (MASS) has mu and theta. Unlike r, theta can take
> non-integer values.)
It is not just the names that differ: the parametrizations also do.
For explanations that I find far superior to yours see MASS (the book)
and the R help page for dnbinom.
> I calculated the parameter mu (the mean) and calculated theta using theta.ml
> (MASS). However, the ks.gof function using negbinom has ONLY the discrete
> formulation, so that can't be used. It is easy enough to calculate the
> expected distribution from these parameters and then run the
> Kolmogorov-Smirnov test, but I have read that this test should be used against
> an expected distribution with KNOWN parameters, not with parameters calculated
> from the data.
>
> So given that I have to get the parameters from the data, is my only choice
> for testing goodness of fit to simulate? If so, I assume I could simply
> calculate the KS statistic repeatedly using samples from the NB distribution
> with the estimated parameters and see whether my sample KS statistic falls
> within 95% of the values. Is there a better way to do this?
For any discrete distribution with estimated parameters, the distribution
of the Kolmogorov-Smirnov test is approximate. The theory applies to a
completely known continuous distribution. I would have thought a grouped
chi-squared test would be more appropriate, and for that the asymptotic
theory is well understood. (Indeed, ?ks.gof says the same.) This is
particularly appropriate if you are interested in the proportions of
zeros.
--
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
|