s-news
[Top] [All Lists]

Re: Poisson glm with aggregated data.

To: <Gerald.Jean@spgdag.ca>
Subject: Re: Poisson glm with aggregated data.
From: Prof Brian Ripley <ripley@stats.ox.ac.uk>
Date: Tue, 4 Dec 2001 22:42:35 +0000 (GMT)
Cc: <s-news@wubios.wustl.edu>
In-reply-to: <OF6F06EAF8.381D45AE-ON85256B18.007425B4@spgdag.ca>
glm has a very sloppy default convergence criterion.  You need to tighten
it, for example by epsilon = 1e-10, to get reproducible results.

Also, I think you should be using an offset and not dividing by unsousi,
as it seems unlikely that nsininct / unsousi is Poisson.  Something like

glm(nsininct ~ bmm + zonecong + offset(log(unsousi)), family = poisson(),
    data = inc.all.agg)

It's not clear to me that what you have done does aggregate exactly: the
sum of nsininct / unsousi * unsous is not the (sum of nsininct) /(sum of
unsousi) * (sum of unsous) in general.

On Tue, 4 Dec 2001 Gerald.Jean@spgdag.ca wrote:

> Hi S-users,
>
> S+6, on NT4
>
> I have a large, very large data set consisting of several factor variables
> and of four continuous variables.  The data comes from transactionnal data
> and the continuous variables are agrregated (sum) over the factors.  A
> large model has been fitted to this data and as I needed to do further work
> with only a few of the factors I re-aggregated over those factors.  My
> questions:
>
> Why is it that if I fit a two (in this example) factor model over the big
> data set I get different coefficients than fitting the same model over the
> smaller data set?
>
> > inc.freq.glm.all <- glm(nsininct / unsousi ~ bmm + zonecong,
> +                         family = poisson(link = log),
> +                         data = inc.all.agg, weights = unsous)
>
> > inc.all.agg.2v <- aggregate(inc.all.agg[, c('unsous', 'unsousi',
> +                                             'nsininct', 'en15incf')],
> +                             by = list(bmm = inc.all.agg[, 'bmm'],
> +                               zonecong = inc.all.agg[, 'zonecong']),
> +                             FUN = sum)
>
> > inc.freq.glm.2v <- glm(nsininct / unsousi ~ bmm + zonecong,
> +                        family = poisson(link = log),
> +                        data = inc.all.agg.2v, weights = unsous)
>
> > ttt.merge <- merge(ttt.all, ttt.2v, by = 'row.names')
> > row.names(ttt.merge) <- ttt.merge[, 1]
> > ttt.merge <- ttt.merge[, -1]
> > round(ttt.merge, 5)
>
>               Value.x Std..Error.x  t.value.x  Value.y Std..Error.y  t.value.y
>  (Intercept) -5.07729      0.01282 -395.94985 -5.07304      0.01291 -393.05116
> bmm Erreur    0.15913      0.03744    4.24999  0.14441      0.03781    3.81901
> bmm Mauvais   0.73437      0.05390   13.62538  0.73928      0.05371   13.76433
> bmm Montreal -0.36010      0.03419  -10.53170 -0.36659      0.03495  -10.48868
> bmm Moyen     0.40942      0.03096   13.22550  0.41195      0.03096   13.30679
>     zonecong -0.01763      0.08312   -0.21216 -0.00122      0.08401   -0.01457
>
> I don't understand why the coefficients are not the same, I checked that
> the sums over the factors are OK and they are.
> By looking at the coefficients and the t-values it almost looks like "bmm
> Montreal" and "bmm Moyen" have been inverted in the aggregation process?
> The division by "unsoui" is not a typo, this variable is an adjusted
> "unsous" for trends, inflation and other factors deemed appropriate by
> actuaries.
>
> Thanks for any insights into this,
>
> Gérald Jean
> Analyste-conseil (statistiques), Actuariat
> télephone            : (418) 835-4900 poste (7639)
> télecopieur          : (418) 835-6657
> courrier électronique: gerald.jean@spgdag.ca
>
> "In God we trust all others must bring data"  W. Edwards Deming
>
> ---------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


<Prev in Thread] Current Thread [Next in Thread>