There is an even better bias correction for the log-normal than the
"slightly biased" one that David Lorenz mentions. It was developed by
Finney DJ. 1941. On the distribution of a variate whose logarithm is
normally distributed. Journal of the Royal Statistical Society
(supplement) 7: 155-161. It involves a Bessel function, for which see
the supplement to Parkhurst DF. 1998. Arithmetic versus geometric means
for environmental concentration data. Environmental Science and
Technology 32: 92A-98A. That paper, not the supplement, analyzes the
degree of bias of both corrections for different sample sizes and a few
distributions.
David Parkhurst
David L Lorenz wrote:
Paul,
Correcting for transformation bias really depends on what you are
trying to do. The model that you specified gives an estimate of the
median response in log-space and in real-world space. If you need an
estimate of the mean response, then there are a couple of options, one
of which is Duan's smearing estimator.
Duan's method is to compute the estimate, add the value of each
residual to that estimate, back transform, and then take the mean of
those data as an estimate of the mean response. It works for any
transformation and is easy to implement in S-PLUS.
If you can assume that the data are log-normally distributed, you can
compute a slightly biased back-transformation correction factor based on
the properties of the log-normal distribution. The correction factor is
exp(mean(residuals)/2). Multiply the back-transformed estimate by that
correction factor. I know that there are FORTRAN versions of a minimum
variance unbiased estimate (MVUE) of this correction factor, but I do
not know if any have been put into S-PLUS or R.
Dave
*"Schwarz,Paul" <PSchwarz@gcrinsight.com>*
Sent by: s-news-owner@lists.biostat.wustl.edu
09/10/2006 02:25 AM
To
<s-news@lists.biostat.wustl.edu>
cc
Subject
[S] predictions using log-transformed response variables
S-News readers,
I know that this is more of a statistical issue than an S-PLUS issue,
but I was hoping that someone would kindly summarize for me the issues
related to making predictions e.g, using predict(), involving models
with a log-transformed reponse variable. For example, if a linear model
is fitted using lm(log(y) ~ x1 + x2, data= ...), what is the proper way
to make predictions using the model? I've heard about a so-called
"smear" factor, but I'm not clear about what it is, or when to apply it,
or how to calculate it. For example, are there standard S-PLUS functions
for calculating a smear factor, or is there an option with the predict
functions? If someone would clarify this issue for me, I would be most
grateful.
Thank you for your time and consideration.
-Paul Schwarz
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
|