s-news
[Top] [All Lists]

Re: AIC model selection and model averaging

To: "Huso, Manuela" <manuela.huso@oregonstate.edu>
Subject: Re: AIC model selection and model averaging
From: Spencer Graves <spencer.graves@PDF.COM>
Date: Mon, 21 Jul 2003 18:43:36 -0700
Cc: s-news@lists.biostat.wustl.edu
References: <0DA91383F539D14EA4AB01A77ADAB84F126CE2@thuja>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02
          Have you looked at Ripley's Pattern Recognition
and Neural Networks (PRNN)?  The first two chapters, especially pp.
32-34, seem relevant to these questions.

The following are my current speculations based in part on these first two chapters of PRNN:

1. As long as model assumptions are reasonable, I would expect that the best might be full informative Bayes. Normal linear
mixtures are reasonably easy to handle in this regard.  Failing that, we
could might approximate priors and posteriors as normal mixtures (which
should be relatively easy in many cases). A user who didn't like that could pursue Markov Chain Monte Carlo, though that may not be computationally feasible for many applications.

        2.  Alternatively, we could use something like the AIC =
(-2)*(log(likelihood)-q) where q = trace(solve(J, K)), were J = Fisher
Information and K = variance of the score function.  Ripley (1996, PRNN,
p. 32) observed, "If the true density belongs to the parametric family,
J = K." To be precise, to get J = K, we need other regularity conditions, e.g., being able to interchange the order of
integration and differentiation in taking expectation.  In this case,
trace(solve(J, K)) = the number of parameters estimated.
        From this context, it appears that Burnham and Anderson (2002,
pp. 65-70) suggest we replace J and K by the observed information and
estimated variance of the score function.  If we do this with the
standard normal linear model, Burnham and Anderson seem to suggest that
trace(solve(J, K)) = k*(1+(k+1)/(n-k-1)) where n = number of
observations and k = number of paramters estimated, including the noise
variance. I need to study PRNN (pp. 33-34) and Burnham & Anderson more before I can express an opinion about this.

3. Since AIC and variants rely on asymptotic arguments, it would be instructive to carry a few more terms in asymptotic expansions for various alternatives and then compare the results with Monte Carlo. For example, Burnham and Anderson (p. 300) provide the following summary of mean square prediction error from using the best model vs. Bayesian Model Averaging using AIC.c and BIC:

        model   best    
        av'g    model   ratio
AIC.c   4.85    5.68    0.85
BIC     5.88    7.66    0.77
ratio   0.83    0.74    

In this study, model averaging gave 15 and 23% smaller mean square prediction errors than using the best model by itself, and the AIC.c, which they recommend, was 17 and 26% better than using the BIC, depending on whether model averaging or the best model was used. I'd like to see this kind of study expanded to include a full Bayesian procedure with a reasonable prior as well as AIC without the finite sample correction.

          Comments?
hope this helps.  spencer graves

Huso, Manuela wrote:
 > Hello, all,
 >
 > I am a statistician whose job it is to consult with researchers in
natrual resources, primarily forestry and wildlife, about study design
and analysis.  Burnham and Anderson's book entitled 'Model Selection and
Inference: a Practical Information-Theoretic Approach' has caused quite
a stir, particulary in the wildlife community and I have people wanting
to apply the technique in every possible situation.
 >
 > I understand that Dr. Ripley has urged extreme caution in following
B&A's guidelines.  I am writing to ask for some specific points of
criticism and/or suggestions of literature that I might read to be able
to form an educated opinion of where their techniques can/should be
applied, where they shouldn't and how to know the difference.
 >
 > I am also particularly interested in model averaging concepts and
their advantages and limitations in both the AIC and BIC context.
 >
 > Many thanks for your help and I hope I don't start a flood :-)
 >
 > Manuela
 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 > Manuela Huso
 > Consulting Statistician
 > 201H Richardson Hall
 > Department of Forest Science
 > Oregon State University
 > Corvallis, OR   97331-5752
 > ph: 541-737-6232
 > fx: 541-737-1393
 >
 >
 > -----Original Message-----
 > From: Spencer Graves [mailto:spencer.graves@PDF.COM]
 > Sent: Monday, July 14, 2003 7:48 AM
 > To: Mary Wisz
 > Cc: s-news@lists.biostat.wustl.edu
 > Subject: Re: [S] model averaging and all- subsets glm's
 >
 >
 > ...
 >     2.  On 6/25/2003, Brian Ripley expressed concern about Burnham and
 > Anderson's book in a thread "logLik.lm()";  see below.  I'm currently a
 > third of the way through reading Pattern Recognition and Neural
 > Networks, recommended by Ripley below.  Using a full Bayesian approach
 > (integrating out parameters, etc.) should be easy for "lm".  With
 > something like "glm", this would be much harder, requiring, e.g.,
 > Hermite polynomial integration with saddle point approximations or
 > Markov Chain Monte Carlo.
 >
 > hope this helps.  spencer graves
 >
 >  > Dear Prof. Ripley:
 >  >
 >  >       I gather you disagree with the observation in Burnham and
Anderson
 >  > (2002, ch. 2) that the "complexity penalty" in the Akaike Information
> > Criterion is a bias correction, and with this correction, they can use > > "density = exp(-AIC/2)" to compute approximate posterior probabilities
 >  > comparing even different distributions?
 >
 > That's the derivation of BIC and similar, not AIC.
 >
 >  >       They use this even to compare discrete and continuous
 > distributions,
 >  > which makes no sense to me.  However, with a common dominating
measure,
 >  > it seems sensible to me.  They cite a growing literature on "Bayesian
 >  > model averaging".  What I've seen of this claims that Bayesian model
 >  > averaging produces better predictions than predictions based on any
 >  > single model, even using these approximate posteriors ("Akaike
weights")
 >  > in place of full Bayesian posteriors.
 >  >
 >  >       I don't have much experience with this, but so far, I seem
to have
 >  > gotten great, informative answers to my clients' questions.  If there
 >  > are serious deficiencies with this kind of procedure, I'd like to
know.
 >
> Yes, model averaging is useful, but is nothing to do with AIC nor Burnham
 > & Anderson.  See e.g. my PRNN book for better ways to do it.
 >
 > Burnham & Anderson (2002) is a book I would recommend people NOT to read
 > until they have read the primary literature.  I see no evidence that the
 > authors have actually read Akaike's papers.
 >
 >





<Prev in Thread] Current Thread [Next in Thread>