s-news
[Top] [All Lists]

Re: Calculating McFadden R2

To: s-news@lists.biostat.wustl.edu
Subject: Re: Calculating McFadden R2
From: "Donald Catanzaro, PhD" <dgcatanzaro@gmail.com>
Date: Fri, 12 Sep 2008 12:04:23 -0500
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:reply-to :user-agent:mime-version:to:subject:content-type :content-transfer-encoding; bh=oUxawqRMn5NyXtA9gc2PuiciN9jhBqSNzYBl8jljkk8=; b=iM51ICFKNxA51n1h0ht4UjsdCA+d0ju36FWWi1rQvV9yXAhp9kTnYmrZd+U9/oyHkr VCty7mHoK7hirOYlo7Yo5H6O2fCm7CMJmCp6JDpA8Z1sRnPdFWVUsF3juQSiijbKLQb2 FeBwEZ/jASPBRjlMxWV3cwGq6LmyEYzW0Ix90=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:reply-to:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; b=F17zL/tUp1aC61NG3jBumI6TiF33Cce4Al4cgtLV8BkIhFW8ICyeHEnkf5iJgUnhq6 RPCQno4Ujfz2hysJpa/Vd4+lkrH9iqYCoTJ7khvf4/kSkPpltS25WIT6OJEWrsGRGevD fCBZ0tpH0TygcLbhMK6De62jlHfWSr2mgDGsM=
Reply-to: dgcatanzaro@gmail.com
User-agent: Thunderbird 2.0.0.16 (Windows/20080708)
Good Day All,

I have a negative binomial model that I've created using the MASS library that I've been working on to cross-validate. One of my performance measures I'd like to use is the McFadden's (pseudo) adjusted R2 which is calculated

R2mf = 1 - [(ln Lhat(Mfull) - (k+1)) / ln Lhat(Mcon)]
where:

Lhat(Mfull) = Estimated likelihood, model with predictors
Lhat(Mcon) = Estimated likelihood, Model without predictors
k = number of independent predictors

For cross validation, I split my data into two groups, a 80% and a 20% via
>Iter1.80<-sample(1:514,0.8*514,replac=F)
>Iter1.20<-setdiff(1:514,Iter1.80)

and then I created the model nb.Iter1 (built on 80% of the data) via

>nb.Iter1 <-glm.nb(formula = TOTCASES ~ TECI + CENT43 + SQ.CENT43 + CENT.INC + SQ.CENT.IN + offset(log(ADJ.POP)), data = LymeDisease[Iter1.80,], na.action= na.omit, control = glm.control(maxit = 500))

and since Deviance = -2* ln Lhat(Mfull) by substitution I can then calculated McFadden's R2 of that model via

>1 - (((nb.Iter1$deviance/-2) - 6)/(nb.Iter1$null.deviance/-2))

So now to my question,

I want to assess the 80% model performance by inspecting how well it predicts the 20% that I reserved from model generation and I wanted to use the McFadden R2 (as well as several other measures) to do this. I know I can use the 20% data I left out to create predictions based on the 80% model via >Iter1.20.predict <- predict(nb.Iter1, LymeDisease[Iter1.20,], type="response")

but I can't seem to figure out how to calculate deviance between my knowns (in the 20% test data) and my unknowns (predictions of the 20% using the 80% model). An additional complication is the large numbers of zeros in my data.

Could anyone point me in the correct direction ? Right now I am thinking (?) that I might need to alter some of the code in the glm.nb in order to get the log-likelihoods for my 20% predictions but assistance would be greatly appreciated

--

-Don
Don Catanzaro, PhD                  Landscape Ecologist
dgcatanzaro@gmail.com               16144 Sigmond Lane
479-751-3616                        Lowell, AR 72745


<Prev in Thread] Current Thread [Next in Thread>
  • Re: Calculating McFadden R2, Donald Catanzaro, PhD <=