s-news
[Top] [All Lists]

factor variables in predict.glm

To: s-news@lists.biostat.wustl.edu
Subject: factor variables in predict.glm
From: Tobias Meyer <tobias.meyer@fowi.ethz.ch>
Date: Mon, 30 Apr 2001 16:51:55 +0200
The outset:
I am fitting logistic regression models with 1 metric variable and >2 factor variables containing > 2 levels each. Using glm on s-plus 4.5, NT 4.0 .

On behave of a nice presentation of my models (together with confidence limits) I constructed a new dataframe containing all the variables used for modelling but specializing on a specific level for each factor variable.

The problem:
In general, predict.glm does not use the coefficients of the desired levels and returns a wrong prediction.

According to the test case below, predict.glm does find the appropriate coefficient when the new data includes all of the factor levels in a lead.

In real life predict.glm failed yet again with new data containing > 2 factor variables.

Do I omit a point or is there a bug?

Thank You for Your help

test case:

options(contrasts=c("contr.treatment","contr.poly" ))
# "orginal data"################################################
aux <- (seq(-4,,0.2,99)+rnorm(99))
y <- exp(aux)/(1+exp(aux))
plot(y)
x <- seq(1,,1,99)
z <- c(rep(c("A","B","C"),33))
mydata <- data.frame(y,x,z=factor(z))

# model #########################################################
model <- glm(y ~ x + z  , family = binomial(link = logit),
data = mydata, na.action = "omit", control = glm.control(epsilon = 0.0001, maxit = 20, trace = T),
model = T, y = T)
anova(model)
summary(model)
dummy.coef(model)

# predict with original data ####################################
original <- data.frame(predict.glm(model,  type = "link", se.fit = T ) ,mydata)
#################################################################
# predict with new data with only one level of factor z present
# construct new data
newx <- seq(1,,1,99)
newz <- c(rep("A",99))
newdat <- data.frame(x = newx, z = factor(newz))

# reccords with z = "A" are not predicted correctly
predicted <- data.frame(predict.glm(model, newdat, type = "link", se.fit = T ) ,newdat,original)
model$coefficients
##################################################################
# predict with new data with three leading records representing all factor z levels,
# body with only one level of factor z present
# construct new data
new2x <- seq(1,,1,99)
new2z <- c("A","B","C",rep("A",96))
newdata2 <- data.frame(x = new2x, z = factor(new2z))

# newdata2 records with z = "a" are precicted correctly.
predicted2 <- data.frame(predict.glm(model, newdata2, type = "link", se.fit = T ) ,newdata2,original)

********************************************
E-Mail-Adresse:
tobias.meyer@fowi.ethz.ch

Tobias Meyer
dipl. geogr. Universitaet Zuerich
D-FOWI
Professur Forstliches Ingenieurwesen
HG G 21.1
ETH Zentrum
CH-8092 Zuerich
Tel +41 1 632 69 23
Fax +41 1 632 11 46

*********************************************

<Prev in Thread] Current Thread [Next in Thread>