Good Day All,
I have a negative binomial model that I've created using the MASS
library that I've been working on to cross-validate. One of my
performance measures I'd like to use is the McFadden's (pseudo) adjusted
R2 which is calculated
R2mf = 1 - [(ln Lhat(Mfull) - (k+1)) / ln Lhat(Mcon)]
where:
Lhat(Mfull) = Estimated likelihood, model with predictors
Lhat(Mcon) = Estimated likelihood, Model without predictors
k = number of independent predictors
For cross validation, I split my data into two groups, a 80% and a 20% via
>Iter1.80<-sample(1:514,0.8*514,replac=F)
>Iter1.20<-setdiff(1:514,Iter1.80)
and then I created the model nb.Iter1 (built on 80% of the data) via
>nb.Iter1 <-glm.nb(formula = TOTCASES ~ TECI + CENT43 + SQ.CENT43 +
CENT.INC + SQ.CENT.IN + offset(log(ADJ.POP)), data =
LymeDisease[Iter1.80,], na.action= na.omit, control = glm.control(maxit
= 500))
and since Deviance = -2* ln Lhat(Mfull) by substitution I can then
calculated McFadden's R2 of that model via
>1 - (((nb.Iter1$deviance/-2) - 6)/(nb.Iter1$null.deviance/-2))
So now to my question,
I want to assess the 80% model performance by inspecting how well it
predicts the 20% that I reserved from model generation and I wanted to
use the McFadden R2 (as well as several other measures) to do this. I
know I can use the 20% data I left out to create predictions based on
the 80% model via
>Iter1.20.predict <- predict(nb.Iter1, LymeDisease[Iter1.20,],
type="response")
but I can't seem to figure out how to calculate deviance between my
knowns (in the 20% test data) and my unknowns (predictions of the 20%
using the 80% model). An additional complication is the large numbers
of zeros in my data.
Could anyone point me in the correct direction ? Right now I am
thinking (?) that I might need to alter some of the code in the glm.nb
in order to get the log-likelihoods for my 20% predictions but
assistance would be greatly appreciated
--
-Don
Don Catanzaro, PhD Landscape Ecologist
dgcatanzaro@gmail.com 16144 Sigmond Lane
479-751-3616 Lowell, AR 72745
|