The outset:
I am fitting logistic regression models with 1 metric variable and >2
factor variables containing > 2 levels each. Using glm on s-plus 4.5, NT 4.0 .
On behave of a nice presentation of my models (together with confidence
limits) I constructed a new dataframe containing all the variables used for
modelling but specializing on a specific level for each factor variable.
The problem:
In general, predict.glm does not use the coefficients of the desired levels
and returns a wrong prediction.
According to the test case below, predict.glm does find the appropriate
coefficient when the new data includes all of the factor levels in a lead.
In real life predict.glm failed yet again with new data containing > 2
factor variables.
Do I omit a point or is there a bug?
Thank You for Your help
test case:
options(contrasts=c("contr.treatment","contr.poly" ))
# "orginal data"################################################
aux <- (seq(-4,,0.2,99)+rnorm(99))
y <- exp(aux)/(1+exp(aux))
plot(y)
x <- seq(1,,1,99)
z <- c(rep(c("A","B","C"),33))
mydata <- data.frame(y,x,z=factor(z))
# model #########################################################
model <- glm(y ~ x + z , family = binomial(link = logit),
data = mydata, na.action = "omit", control = glm.control(epsilon = 0.0001,
maxit = 20, trace = T),
model = T, y = T)
anova(model)
summary(model)
dummy.coef(model)
# predict with original data ####################################
original <- data.frame(predict.glm(model, type = "link", se.fit = T ) ,mydata)
#################################################################
# predict with new data with only one level of factor z present
# construct new data
newx <- seq(1,,1,99)
newz <- c(rep("A",99))
newdat <- data.frame(x = newx, z = factor(newz))
# reccords with z = "A" are not predicted correctly
predicted <- data.frame(predict.glm(model, newdat, type = "link", se.fit =
T ) ,newdat,original)
model$coefficients
##################################################################
# predict with new data with three leading records representing all factor
z levels,
# body with only one level of factor z present
# construct new data
new2x <- seq(1,,1,99)
new2z <- c("A","B","C",rep("A",96))
newdata2 <- data.frame(x = new2x, z = factor(new2z))
# newdata2 records with z = "a" are precicted correctly.
predicted2 <- data.frame(predict.glm(model, newdata2, type = "link", se.fit
= T ) ,newdata2,original)
********************************************
E-Mail-Adresse:
tobias.meyer@fowi.ethz.ch
Tobias Meyer
dipl. geogr. Universitaet Zuerich
D-FOWI
Professur Forstliches Ingenieurwesen
HG G 21.1
ETH Zentrum
CH-8092 Zuerich
Tel +41 1 632 69 23
Fax +41 1 632 11 46
*********************************************
|