s-news
[Top] [All Lists]

Poisson glm with aggregated data.

To: s-news@wubios.wustl.edu
Subject: Poisson glm with aggregated data.
From: Gerald.Jean@spgdag.ca
Date: Tue, 4 Dec 2001 16:55:16 -0500
Hi S-users,

S+6, on NT4

I have a large, very large data set consisting of several factor variables
and of four continuous variables.  The data comes from transactionnal data
and the continuous variables are agrregated (sum) over the factors.  A
large model has been fitted to this data and as I needed to do further work
with only a few of the factors I re-aggregated over those factors.  My
questions:

Why is it that if I fit a two (in this example) factor model over the big
data set I get different coefficients than fitting the same model over the
smaller data set?

> inc.freq.glm.all <- glm(nsininct / unsousi ~ bmm + zonecong,
+                         family = poisson(link = log),
+                         data = inc.all.agg, weights = unsous)

> inc.all.agg.2v <- aggregate(inc.all.agg[, c('unsous', 'unsousi',
+                                             'nsininct', 'en15incf')],
+                             by = list(bmm = inc.all.agg[, 'bmm'],
+                               zonecong = inc.all.agg[, 'zonecong']),
+                             FUN = sum)

> inc.freq.glm.2v <- glm(nsininct / unsousi ~ bmm + zonecong,
+                        family = poisson(link = log),
+                        data = inc.all.agg.2v, weights = unsous)

> ttt.merge <- merge(ttt.all, ttt.2v, by = 'row.names')
> row.names(ttt.merge) <- ttt.merge[, 1]
> ttt.merge <- ttt.merge[, -1]
> round(ttt.merge, 5)

              Value.x Std..Error.x  t.value.x  Value.y Std..Error.y  t.value.y
 (Intercept) -5.07729      0.01282 -395.94985 -5.07304      0.01291 -393.05116
bmm Erreur    0.15913      0.03744    4.24999  0.14441      0.03781    3.81901
bmm Mauvais   0.73437      0.05390   13.62538  0.73928      0.05371   13.76433
bmm Montreal -0.36010      0.03419  -10.53170 -0.36659      0.03495  -10.48868
bmm Moyen     0.40942      0.03096   13.22550  0.41195      0.03096   13.30679
    zonecong -0.01763      0.08312   -0.21216 -0.00122      0.08401   -0.01457

I don't understand why the coefficients are not the same, I checked that
the sums over the factors are OK and they are.
By looking at the coefficients and the t-values it almost looks like "bmm
Montreal" and "bmm Moyen" have been inverted in the aggregation process?
The division by "unsoui" is not a typo, this variable is an adjusted
"unsous" for trends, inflation and other factors deemed appropriate by
actuaries.

Thanks for any insights into this,

Gérald Jean
Analyste-conseil (statistiques), Actuariat
télephone            : (418) 835-4900 poste (7639)
télecopieur          : (418) 835-6657
courrier électronique: gerald.jean@spgdag.ca

"In God we trust all others must bring data"  W. Edwards Deming


<Prev in Thread] Current Thread [Next in Thread>