s-news
[Top] [All Lists]

Re: Two questions about glm and counts

To: Tristan Lorino <tristan.lorino@lcpc.fr>
Subject: Re: Two questions about glm and counts
From: Prof Brian Ripley <ripley@stats.ox.ac.uk>
Date: Wed, 20 Apr 2005 08:17:56 +0100 (BST)
Cc: s-news@lists.biostat.wustl.edu
In-reply-to: <146101594.20050420090902@lcpc.fr>
References: <146101594.20050420090902@lcpc.fr>
 This message is in MIME format.  The first part should be readable text,
 while the remaining parts are likely unreadable without MIME-aware tools.

--27464147-817277244-1113981476=:19555
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

The `correct' way to do this is a product-multinomial model, but a surrogate Poisson GLM can be used. (You do have counts, for each ABCD combination how many, which might be zero, but they are not independent joint Poisson. Hence `surrogate'.)

Examples and detailed explanation in MASS (the book) chapter 7, or good books on categorical data (e.g. Agresti, 2002). Function multinom() in library nnet is the main tool.

Because of Simpson's paradox it is dangerous to look at marginal tables
as you say you did.  If you are unaware of that, please research it.

On Wed, 20 Apr 2005, Tristan Lorino wrote:

Hi,

I  have  four  categorical  variables  (A,  B, C and D) in column, and
around  1,000 observations. Variables "B", "C" and "D" are covariates,
and  variable  "A" the variable to explain. I computed crosstables for
A-B,  A-C and A-D: each chi-square are significant. Now I  would  like
to   perform   a   model  with  all  the  covariates in the same time.
I   think  that  glm  should  be a good choice. But I cannot find the
good   family/link   option:  "poisson"  family  seems  to be only for
counts   (this   is  not  the  case  here) and "binomial" family (with
"logit"  link) is only for binary outcomes (all my variables have more
than 2 levels).

The  aim  is  to analyze the joint impact of the three covariates, and
maybe to determine the covariate the "most related" to A.

Another question  still remains to me: how can I create a new file
with five  columns, the first fourth presenting all the possible combinations
of  the  levels  of    my    four   variables (i.e. one combination by
line),   and   the  fifth   column   (say E) containing the  amount of
observations for each combination?

Thank you in advance,
Tristan Lorino

--
Laboratoire Central des Ponts et Chaussées
[Division ESAR ? Section AGR]
Route de Bouaye BP 4129
44341 Bouguenais Cedex
France
Tél. 33 (0)2 40 84 56 18

--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news



--
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
--27464147-817277244-1113981476=:19555--

<Prev in Thread] Current Thread [Next in Thread>