On Wed, 26 Jan 2005, Henrik Parn wrote:
|Dear S plus users,
||I want to calculate AIC on a linear regression model. ||The data set has
some NAs in the response variable 'yy' and some NAs in the predictor 'x'.
Therefor, as I normally do when I have NAs, I used na.action=na.exclude in
the formula:
AIC(lm(yy ~ x, data=datasett, na.action=na.exclude))
However, the answer is:
[1] NA
Since the s-plus help says '|the na.omit and na.exclude functions return the
same data frame'
But it does not say that lm(na.action=na.exclude) and
lm(na.action=na.omit) return the same results (or what would be the
point of both)?
I didn't consider my choice of na.action as the cause of my
problem. However at last (after desperately trying ALL other possible and
impossible ways) I tried na.omit, and voilà:
|> AIC(lm(yy ~ x, data=datasett, na.action=na.omit))
[1] 21.68606
So I wonder, is my 'only' problem here that na.exclude and na.omit aren't
really interchangeable, or could it be something else going on?
They are not interchangeable, but the logLik and AIC should be the same.
However, there is a known (to me at least) bug in S-PLUS's logLik.lm
(taken from nlme, I believe as that had the same problem) which AIC.lm
calls. It starts
res <- resid(object)
p <- object$rank
N <- length(res)
and so the residuals and N both include the values omitted before fitting.
Hint to Insightful: it should be res <- object$residuals.
--
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
|