Have you looked at Ripley's Pattern Recognition
and Neural Networks (PRNN)? The first two chapters, especially pp.
32-34, seem relevant to these questions.
The following are my current speculations based in part on these
first two chapters of PRNN:
1. As long as model assumptions are reasonable, I would expect
that the best might be full informative Bayes. Normal linear
mixtures are reasonably easy to handle in this regard. Failing that, we
could might approximate priors and posteriors as normal mixtures (which
should be relatively easy in many cases). A user who didn't like that
could pursue Markov Chain Monte Carlo, though that may not be
computationally feasible for many applications.
2. Alternatively, we could use something like the AIC =
(-2)*(log(likelihood)-q) where q = trace(solve(J, K)), were J = Fisher
Information and K = variance of the score function. Ripley (1996, PRNN,
p. 32) observed, "If the true density belongs to the parametric family,
J = K." To be precise, to get J = K, we need other regularity
conditions, e.g., being able to interchange the order of
integration and differentiation in taking expectation. In this case,
trace(solve(J, K)) = the number of parameters estimated.
From this context, it appears that Burnham and Anderson (2002,
pp. 65-70) suggest we replace J and K by the observed information and
estimated variance of the score function. If we do this with the
standard normal linear model, Burnham and Anderson seem to suggest that
trace(solve(J, K)) = k*(1+(k+1)/(n-k-1)) where n = number of
observations and k = number of paramters estimated, including the noise
variance. I need to study PRNN (pp. 33-34) and Burnham & Anderson more
before I can express an opinion about this.
3. Since AIC and variants rely on asymptotic arguments, it would be
instructive to carry a few more terms in asymptotic expansions for
various alternatives and then compare the results with Monte Carlo. For
example, Burnham and Anderson (p. 300) provide the following summary of
mean square prediction error from using the best model vs. Bayesian
Model Averaging using AIC.c and BIC:
model best
av'g model ratio
AIC.c 4.85 5.68 0.85
BIC 5.88 7.66 0.77
ratio 0.83 0.74
In this study, model averaging gave 15 and 23% smaller mean square
prediction errors than using the best model by itself, and the AIC.c,
which they recommend, was 17 and 26% better than using the BIC,
depending on whether model averaging or the best model was used. I'd
like to see this kind of study expanded to include a full Bayesian
procedure with a reasonable prior as well as AIC without the finite
sample correction.
Comments?
hope this helps. spencer graves
Huso, Manuela wrote:
> Hello, all,
>
> I am a statistician whose job it is to consult with researchers in
natrual resources, primarily forestry and wildlife, about study design
and analysis. Burnham and Anderson's book entitled 'Model Selection and
Inference: a Practical Information-Theoretic Approach' has caused quite
a stir, particulary in the wildlife community and I have people wanting
to apply the technique in every possible situation.
>
> I understand that Dr. Ripley has urged extreme caution in following
B&A's guidelines. I am writing to ask for some specific points of
criticism and/or suggestions of literature that I might read to be able
to form an educated opinion of where their techniques can/should be
applied, where they shouldn't and how to know the difference.
>
> I am also particularly interested in model averaging concepts and
their advantages and limitations in both the AIC and BIC context.
>
> Many thanks for your help and I hope I don't start a flood :-)
>
> Manuela
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Manuela Huso
> Consulting Statistician
> 201H Richardson Hall
> Department of Forest Science
> Oregon State University
> Corvallis, OR 97331-5752
> ph: 541-737-6232
> fx: 541-737-1393
>
>
> -----Original Message-----
> From: Spencer Graves [mailto:spencer.graves@PDF.COM]
> Sent: Monday, July 14, 2003 7:48 AM
> To: Mary Wisz
> Cc: s-news@lists.biostat.wustl.edu
> Subject: Re: [S] model averaging and all- subsets glm's
>
>
> ...
> 2. On 6/25/2003, Brian Ripley expressed concern about Burnham and
> Anderson's book in a thread "logLik.lm()"; see below. I'm currently a
> third of the way through reading Pattern Recognition and Neural
> Networks, recommended by Ripley below. Using a full Bayesian approach
> (integrating out parameters, etc.) should be easy for "lm". With
> something like "glm", this would be much harder, requiring, e.g.,
> Hermite polynomial integration with saddle point approximations or
> Markov Chain Monte Carlo.
>
> hope this helps. spencer graves
>
> > Dear Prof. Ripley:
> >
> > I gather you disagree with the observation in Burnham and
Anderson
> > (2002, ch. 2) that the "complexity penalty" in the Akaike Information
> > Criterion is a bias correction, and with this correction, they
can use
> > "density = exp(-AIC/2)" to compute approximate posterior
probabilities
> > comparing even different distributions?
>
> That's the derivation of BIC and similar, not AIC.
>
> > They use this even to compare discrete and continuous
> distributions,
> > which makes no sense to me. However, with a common dominating
measure,
> > it seems sensible to me. They cite a growing literature on "Bayesian
> > model averaging". What I've seen of this claims that Bayesian model
> > averaging produces better predictions than predictions based on any
> > single model, even using these approximate posteriors ("Akaike
weights")
> > in place of full Bayesian posteriors.
> >
> > I don't have much experience with this, but so far, I seem
to have
> > gotten great, informative answers to my clients' questions. If there
> > are serious deficiencies with this kind of procedure, I'd like to
know.
>
> Yes, model averaging is useful, but is nothing to do with AIC nor
Burnham
> & Anderson. See e.g. my PRNN book for better ways to do it.
>
> Burnham & Anderson (2002) is a book I would recommend people NOT to read
> until they have read the primary literature. I see no evidence that the
> authors have actually read Akaike's papers.
>
>
|