glm has a very sloppy default convergence criterion. You need to tighten
it, for example by epsilon = 1e-10, to get reproducible results.
Also, I think you should be using an offset and not dividing by unsousi,
as it seems unlikely that nsininct / unsousi is Poisson. Something like
glm(nsininct ~ bmm + zonecong + offset(log(unsousi)), family = poisson(),
data = inc.all.agg)
It's not clear to me that what you have done does aggregate exactly: the
sum of nsininct / unsousi * unsous is not the (sum of nsininct) /(sum of
unsousi) * (sum of unsous) in general.
On Tue, 4 Dec 2001 Gerald.Jean@spgdag.ca wrote:
> Hi S-users,
>
> S+6, on NT4
>
> I have a large, very large data set consisting of several factor variables
> and of four continuous variables. The data comes from transactionnal data
> and the continuous variables are agrregated (sum) over the factors. A
> large model has been fitted to this data and as I needed to do further work
> with only a few of the factors I re-aggregated over those factors. My
> questions:
>
> Why is it that if I fit a two (in this example) factor model over the big
> data set I get different coefficients than fitting the same model over the
> smaller data set?
>
> > inc.freq.glm.all <- glm(nsininct / unsousi ~ bmm + zonecong,
> + family = poisson(link = log),
> + data = inc.all.agg, weights = unsous)
>
> > inc.all.agg.2v <- aggregate(inc.all.agg[, c('unsous', 'unsousi',
> + 'nsininct', 'en15incf')],
> + by = list(bmm = inc.all.agg[, 'bmm'],
> + zonecong = inc.all.agg[, 'zonecong']),
> + FUN = sum)
>
> > inc.freq.glm.2v <- glm(nsininct / unsousi ~ bmm + zonecong,
> + family = poisson(link = log),
> + data = inc.all.agg.2v, weights = unsous)
>
> > ttt.merge <- merge(ttt.all, ttt.2v, by = 'row.names')
> > row.names(ttt.merge) <- ttt.merge[, 1]
> > ttt.merge <- ttt.merge[, -1]
> > round(ttt.merge, 5)
>
> Value.x Std..Error.x t.value.x Value.y Std..Error.y t.value.y
> (Intercept) -5.07729 0.01282 -395.94985 -5.07304 0.01291 -393.05116
> bmm Erreur 0.15913 0.03744 4.24999 0.14441 0.03781 3.81901
> bmm Mauvais 0.73437 0.05390 13.62538 0.73928 0.05371 13.76433
> bmm Montreal -0.36010 0.03419 -10.53170 -0.36659 0.03495 -10.48868
> bmm Moyen 0.40942 0.03096 13.22550 0.41195 0.03096 13.30679
> zonecong -0.01763 0.08312 -0.21216 -0.00122 0.08401 -0.01457
>
> I don't understand why the coefficients are not the same, I checked that
> the sums over the factors are OK and they are.
> By looking at the coefficients and the t-values it almost looks like "bmm
> Montreal" and "bmm Moyen" have been inverted in the aggregation process?
> The division by "unsoui" is not a typo, this variable is an adjusted
> "unsous" for trends, inflation and other factors deemed appropriate by
> actuaries.
>
> Thanks for any insights into this,
>
> Gérald Jean
> Analyste-conseil (statistiques), Actuariat
> télephone : (418) 835-4900 poste (7639)
> télecopieur : (418) 835-6657
> courrier électronique: gerald.jean@spgdag.ca
>
> "In God we trust all others must bring data" W. Edwards Deming
>
> ---------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu. To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message: unsubscribe s-news
>
--
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
|