s-news
[Top] [All Lists]

Re: [S] Missing factor levels and predict.

To: Gérald Jean <Gerald.Jean@spgdag.ca>
Subject: Re: [S] Missing factor levels and predict.
From: Frank E Harrell Jr <fharrell@virginia.edu>
Date: Wed, 23 Feb 2000 09:50:50 -0800
Cc: s-news@wubios.wustl.edu
Organization: University of Virginia
References: <8525688E.005D4A3E.00@mail.spgdag.com>
Sender: owner-s-news@wubios.wustl.edu
One approach would be to make a prediction that is un-conditioned on that
variable, as you have no way to make an estimate for a cell that didn't
exist in the training data.  With the Design library contrast function it is 
easy
to get predictions like this, with confidence intervals:

# In a model containing age, race, and sex,
# compute an estimate of the mean response for a
# 50 year old male, averaged over the races using
# observed frequencies for the races as weights

f <- ols(y ~ age + race + sex)
contrast(f, list(age=50, sex='male', race=levels(race)),
         type='average', weights=table(race))

It would not be too hard do this with predict() for any S-Plus method,
if you also get the "predicted" design matrix so that you can average its rows
to compute the proper variances of weight or unweighted marginal predicted
means.  The marginal means are estimated expected values in the absence
of knowledge of the factor in question.  -Frank

"Grald Jean" wrote:

> Hello S-users,
>
> a glm was developed using a dataset where one of the predicting factors had a
> level with no observations, consequently there was no parameter estimate for
> that level.  Now I am trying to predict, using that model, on a new dataset
> where the above factor has no missing levels.  I get the following error, 
> which
> of course makes sense!
>
> > ttt_predict.glm(c994.freq.glm.final, newdata = acr9597.all,
> + type = 'response')
> Error in model.frame.default(terms.object, data, x..: factor mtass has new 
> level
> (s)  069 = Moins de 70K
>
> is there a way around this?  I know that I could combine the "new" level with
> the closest one, but the same data set has to be
> scored with several other models where the above offending missing level is
> present?  I would be very happy if I could manually insert
> a coefficient (it could be eyeballed by looking at the other coefficients for
> that factor) in the existing model for the missing level.  Is this
> possible? and how?
>
> Any help appreciated, I'll summarize to the list,
>
> Grald Jean
> Analyste-conseil (statistiques), Actuariat
> tlephone            : (418) 835-8839
> tlecopieur          : (418) 835-5865
> courrier lectronique: gerald.jean@spgdag.ca
>
> "In God we trust all others must bring data"
>
> -----------------------------------------------------------------------
> This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
> send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
> message:  unsubscribe s-news

--
Frank E Harrell Jr
Professor of Biostatistics and Statistics
Division of Biostatistics and Epidemiology
Department of Health Evaluation Sciences
University of Virginia School of Medicine
http://hesweb1.med.virginia.edu/biostat


-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>