s-news
[Top] [All Lists]

Re:

To: s-news@lists.biostat.wustl.edu
Subject: Re:
From: "Donald Catanzaro, PhD" <dgcatanzaro@gmail.com>
Date: Thu, 25 Sep 2008 15:30:15 -0500
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:reply-to :user-agent:mime-version:to:subject:content-type :content-transfer-encoding; bh=xl9Kk17s8SMdTZ++9B2y0+pGI/L1YD++KzWPpAckhGM=; b=EM3KUUb1RdfK1qkF/jzUz5azGKqp39QTnHBSRS+8ykSNcYE8MophBymgf7YIKauo/L zvkhz7oGR9q9xSJHRQoXUmd6D6xAeiiZ/1az5nQC4o68Mraz5BsG3AlGx/I8rRs1wXFk F7OxJJZxvND54xcSXqeq7Zz44BklThBa3Fnog=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:reply-to:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; b=DWLjRpUOKg63iTjDCRXi7s/79rWVZrwqQphzgmHqajLBf3YNTuLtpbe/DuAhmSZZPC N1wlo3OQlCxgHCD9uCQldp1UxqRcEDcKQ2d7QfC+kC3LWlTIugAFirl32jQfd5oRlvHO MHgy+rIrlfE1n75oXQgoSujgPtcIvF68Rqwjw=
Reply-to: dgcatanzaro@gmail.com
User-agent: Thunderbird 2.0.0.16 (Windows/20080708)
Hi All,

My apologies to the list as I lurch forward in my humble quest to cross-validate my dataset. As folks have seen it is going rather slower than I had hoped which is mainly due to my own lacking than anything else.

I've been working on subsetting my dataset into an 80/20 split and creating a model with the 80% data and then using the remaining 20% for model validation. For performance measures of the 80% model I'd like to use the AIC and BIC coming from the 20% validation dataset.

It is rather nice that glm includes a subset option so I can create my model using 80% of the data when supplied with the correct vector. Is there a similar option where I can run the 20% data through the 80% derived GLM and thus pull out the deviance & log-likelihoods without additional calculations ?

If not, if I understand correctly, my other option would be to:
A)  predict the 80% data points from the 80% model
B)  find mu and size of the 80% predicted data points using fitdistr
C) calculate the log likelihood of the 20% validation dataset using mu and size from the 80% predicted data points
D)  calculate AIC and BIC from that log-likelihood

If I can't run the 20% data through my 80% model, would A-D get me where I'd like to be ?

--

-Don
Don Catanzaro, PhD                  Landscape Ecologist
dgcatanzaro@gmail.com               16144 Sigmond Lane
479-751-3616                        Lowell, AR 72745


<Prev in Thread] Current Thread [Next in Thread>
  • Re:, Donald Catanzaro, PhD <=
    • Re:, Frank E Harrell Jr