Hi S-users,
S+6, on NT4
I have a large, very large data set consisting of several factor variables
and of four continuous variables. The data comes from transactionnal data
and the continuous variables are agrregated (sum) over the factors. A
large model has been fitted to this data and as I needed to do further work
with only a few of the factors I re-aggregated over those factors. My
questions:
Why is it that if I fit a two (in this example) factor model over the big
data set I get different coefficients than fitting the same model over the
smaller data set?
> inc.freq.glm.all <- glm(nsininct / unsousi ~ bmm + zonecong,
+ family = poisson(link = log),
+ data = inc.all.agg, weights = unsous)
> inc.all.agg.2v <- aggregate(inc.all.agg[, c('unsous', 'unsousi',
+ 'nsininct', 'en15incf')],
+ by = list(bmm = inc.all.agg[, 'bmm'],
+ zonecong = inc.all.agg[, 'zonecong']),
+ FUN = sum)
> inc.freq.glm.2v <- glm(nsininct / unsousi ~ bmm + zonecong,
+ family = poisson(link = log),
+ data = inc.all.agg.2v, weights = unsous)
> ttt.merge <- merge(ttt.all, ttt.2v, by = 'row.names')
> row.names(ttt.merge) <- ttt.merge[, 1]
> ttt.merge <- ttt.merge[, -1]
> round(ttt.merge, 5)
Value.x Std..Error.x t.value.x Value.y Std..Error.y t.value.y
(Intercept) -5.07729 0.01282 -395.94985 -5.07304 0.01291 -393.05116
bmm Erreur 0.15913 0.03744 4.24999 0.14441 0.03781 3.81901
bmm Mauvais 0.73437 0.05390 13.62538 0.73928 0.05371 13.76433
bmm Montreal -0.36010 0.03419 -10.53170 -0.36659 0.03495 -10.48868
bmm Moyen 0.40942 0.03096 13.22550 0.41195 0.03096 13.30679
zonecong -0.01763 0.08312 -0.21216 -0.00122 0.08401 -0.01457
I don't understand why the coefficients are not the same, I checked that
the sums over the factors are OK and they are.
By looking at the coefficients and the t-values it almost looks like "bmm
Montreal" and "bmm Moyen" have been inverted in the aggregation process?
The division by "unsoui" is not a typo, this variable is an adjusted
"unsous" for trends, inflation and other factors deemed appropriate by
actuaries.
Thanks for any insights into this,
Gérald Jean
Analyste-conseil (statistiques), Actuariat
télephone : (418) 835-4900 poste (7639)
télecopieur : (418) 835-6657
courrier électronique: gerald.jean@spgdag.ca
"In God we trust all others must bring data" W. Edwards Deming
|