s-news
[Top] [All Lists]

[S] Summary: Missing factor levels and predict.

To: s-news@wubios.wustl.edu
Subject: [S] Summary: Missing factor levels and predict.
From: "Gérald Jean" <Gerald.Jean@spgdag.ca>
Date: Thu, 24 Feb 2000 09:04:01 -0500
Sender: owner-s-news@wubios.wustl.edu
yesterday I posted the attched message concerning missing factor levels and
prediction.  I received two answers, one from Frank Harrel Jr, attached, and one
from Stephen Smith suggesting to look at predict.gam.  Predict.gam will work but
from comparing the output produced (by using type = 'terms') for the factor with
missing level we can see that the missing level gets a parameter estimate of 0.
The method suggested by Frank uses estimated expected values which is fair in
absence of knowledge about the missing level.

Thanks to Stephen and Frank

Gérald Jean
Analyste-conseil (statistiques), Actuariat
télephone            : (418) 835-8839
télecopieur          : (418) 835-5865
courrier électronique: gerald.jean@spgdag.ca

"In God we trust all others must bring data"

---------------
Original posting:

> Hello S-users,
>
> a glm was developed using a dataset where one of the predicting factors had a
> level with no observations, consequently there was no parameter estimate for
> that level.  Now I am trying to predict, using that model, on a new dataset
> where the above factor has no missing levels.  I get the following error,
which
> of course makes sense!
>
> > ttt_predict.glm(c994.freq.glm.final, newdata = acr9597.all,
> + type = 'response')
> Error in model.frame.default(terms.object, data, x..: factor mtass has new
level
> (s)  069 = Moins de 70K
>
> is there a way around this?  I know that I could combine the "new" level with
> the closest one, but the same data set has to be
> scored with several other models where the above offending missing level is
> present?  I would be very happy if I could manually insert
> a coefficient (it could be eyeballed by looking at the other coefficients for
> that factor) in the existing model for the missing level.  Is this
> possible? and how?
>
> Any help appreciated, I'll summarize to the list,
>
> Grald Jean
> Analyste-conseil (statistiques), Actuariat
> tlephone            : (418) 835-8839
> tlecopieur          : (418) 835-5865
> courrier lectronique: gerald.jean@spgdag.ca
>
> "In God we trust all others must bring data"
>


Frank Harrel's reply:

One approach would be to make a prediction that is un-conditioned on that
variable, as you have no way to make an estimate for a cell that didn't
exist in the training data.  With the Design library contrast function it is
easy
to get predictions like this, with confidence intervals:

# In a model containing age, race, and sex,
# compute an estimate of the mean response for a
# 50 year old male, averaged over the races using
# observed frequencies for the races as weights

f <- ols(y ~ age + race + sex)
contrast(f, list(age=50, sex='male', race=levels(race)),
         type='average', weights=table(race))

It would not be too hard do this with predict() for any S-Plus method,
if you also get the "predicted" design matrix so that you can average its rows
to compute the proper variances of weight or unweighted marginal predicted
means.  The marginal means are estimated expected values in the absence
of knowledge of the factor in question.  -Frank
<Prev in Thread] Current Thread [Next in Thread>
  • [S] Summary: Missing factor levels and predict., "Gérald Jean" <=