s-news
[Top] [All Lists]

SUMMARY of RE: [S] Mixed Effects model clarification

To: S-News <s-news@wubios.wustl.edu>
Subject: SUMMARY of RE: [S] Mixed Effects model clarification
From: "COOPER, Andy" <ACOOPER@audubon.org>
Date: Thu, 13 Jan 2000 15:29:35 -0500
Sender: owner-s-news@wubios.wustl.edu

Dear S+ users,

Sorry to take so long in summarizing the responses to my question (I'm still
trying to get a handle on all of it).  First, a public thank you for
everyone who helped on this:

Alan Zaslavsky
Douglas Bates
Renaud Lancelot
Joseph Lucke
Frank Lawrence
Sama LowChoy
Pat Sullivan
Anne York

Rather than posting all the e-mails that have been flying back and forth on
this, I'll try and summarize (which means if I make a mistake on something,
it's most likely my fault in trying to get a handle on all this).  First
thing to note...never actually use "_" in a label.  I was just doing that
here because I think they're easier to read than using ".".  However, if you
use a "_" as a label in Splus, bad things will happen.  My original question
is posted below.

If I am trying only to estimate the success rates in the units that are
contained in my dataset and am not concerned with serial correlation, I
could just use a glm() and treat Unit as a 20-level factor:

glm(Success~Unit + Road_Density + Season_Length, family = binomial(link =
logit),..... )

If I am trying to estimate an equation for the success rates for units in my
data set *and* outside my dataset, then I'd would use a mixed-effects model:

lme(fixed = Success ~ Road_Density + Season_Length, random = Season_Length |
Unit, data = my_data_file)

A number of people pointed out that Road_Density can not have a random
effect because there is no variability within the Units (I'm kicking myself
for having forgotten that).  The above equation will produce a mean success
rate for all units, one parameter that incorporates the effect of
Road_Density, one parameter the incorporates the effect of Season_Length,
and 40 Best Linear Unbiased Predictors (BLUPs)(one intercept predictor for
each of my 20 units and one slope predictor relating to Season_Length for
each of my 20 units).  If I want to predict success rates in a unit that was
not in my dataset, I use the mean and two parameters.  If I want to predict
success rates for a unit that was in my data set, I use the mean, the two
parameters, and that unit's BLUPs.

However, mixed-effects models do not necessarily take into account all the
residual serial correlation.  Therefore, I should also look at adding the
"corr = corAR1()" to my lme() statement to test for more complicated
correlation structure.  

The way one responder put it... I'm caught between a fixed-effects GLM to
handle my non-Gaussain Success variable, and a Gaussian random effects model
with a transformed success rate.  One solution to this is to use generalized
mixed-effects models in SAS (supposedly a GLIMMIX macro) or a program such
as MLwiN (www.ioe.ac.uk/mlwin).  I was also given the reference:"Comparisons
of Software Packages for Generalized Linear Multilevel Models" by X Zhou, AJ
Perkins, and SL Hui in The American Statistician, August 1999, 53(3),
282-290. Another solution, if I still want to go with S+, is to use
Generalized Estimating Equations (GEEs).  There are 2 S+ GEE libraries: gee
and yags (by the same author), available at
biosun1.harvard.edu/~carey/index.ssoft.html.  

Thanks for everyone's help on this!!!  If anyone else has any comments (or
corrections to what I've stated here), please feel free to contact me.  

Thanks again!

-- Andy

***********************************************************
Andy Cooper
Fisheries Conservation Biologist
National Audubon Society, Living Oceans Program
550 South Bay Ave.
Islip, New York 11751
(631) 859-1588
FAX: (631) 581-5268
acooper@audubon.org


> -----Original Message-----
> From: COOPER, Andy 
> Sent: Thursday, January 06, 2000 3:27 PM
> To: S-News
> Subject: [S] Mixed Effects model clarification
> 
> 
> 
> 
> Dear S+ users,
> 
> Please forgive me if this seems like a very basic question, 
> but this is my
> first look at mixed-effect models.  I've searched all my 
> limited resources
> (S+ manuals, Venables and Ripley (1999), and the StatLib 
> archives), and
> can't quite find the answer to my question.  I'm using S-plus 
> 2000 Release 2
> on a PC running Windows95.  Basically, I'm trying to figure 
> out if S+ is
> doing what I think/hope it's doing, and that it's actually 
> the correct thing
> to do.
> 
> I'm trying to estimate an equation to predict the success rate of elk
> hunters in Idaho.  The state is broken up into about 20 
> units, and I have
> data for each of these units for 6 years.  My dependent 
> variable is the
> success rate of hunters in each of the units in each year.  
> Some of the
> independent variables that may be related to the success rate 
> change across
> the years and differ between units (e.g. length of the 
> hunting season that
> year, what weapons were allowed that year, an estimate of the 
> elk population
> size that year, etc).  There are other independent variables 
> that describe
> the physical features of each unit that do not change over 
> time (at least at
> the scale which I'm able to measure them), such as road 
> density, percent
> area that is forested, etc. As such, my data looks a bit like 
> this (but with
> more years, more units, and more variables...these are 
> made-up numbers):
> 
> Unit  Year    Success Season_Length   Elk_Population_Size     
> Road_Density
> 1     1990    0.2             100             1000            0.05
> 1     1991    0.15            120             1020            0.05
> 1     1992    0.18            110             900             0.05
> 2     1990    0.1             60              200             0.02
> 2     1991    0.09            60              250             0.02
> 2     1992    0.15            80              220             0.02
> 
> What I am trying to do is determine which variables should be 
> included, and
> estimate the parameters for the equations :
> 
> S(i,j) = b0 + b1(j) + b2*Season_Length(i,j) + 
> b3*Pop_Size(i,j)... (eq 1)
> b1(j) = a0(j) + a1*Road_Density(j) + 
> a2*Percent_Forested_Area(j) + .... (eq
> 2)
> 
> Where S(i,j) is the properly transformed success rate in year 
> i for unit j.
> Basically, the variable that allows each unit to have a 
> separate intercept,
> b1(j), is a function of the attributes of the unit (which do 
> not change over
> time).  
> 
> To test this, what I think I would use would be the lme() 
> equation as such:
> 
> lme(Success ~ Season_Length, random = ~1|Unit, data = my_data_file)
> 
> This particular equation would allow me to test the significance of
> Season_Length, and produce estimates for the parameters b0, 
> a0(j) and b2.
> This makes sense to me, and I think it's what I'm trying to 
> do and supposed
> to do (But I could easily be wrong).
> 
> Where I get confused is when I'm trying to test for the 
> significance of
> something like Road_Density which is a continuous-valued, 
> constant attribute
> of the Unit.  From what I've been able to decipher so far, I 
> should use the
> command:
> 
> lme(Success ~ Road_Density, random = ~Road_Density|Unit, data =
> my_data_file)
> 
> if I want to allow the parameter for Road_Density to vary by 
> Unit, and the
> command:
> 
> lme(Success ~ Road_Density, random = ~1|Unit, data = my_data_file)
> 
> to test whether the parameter is the same for all Units. What 
> I can't quite
> figure out is how S+ takes into account the fact that Road_Density is
> constant over time, or whether it explicitly needs to.  For 
> instance if this
> were a repeated measures anova (using the aov() statement with
> "+Error(Unit)" included), there are separate tests for within 
> Units and
> between Units, each with the appropriate degrees of freedom.  Does the
> mixed-effect model simply take both of those into account and 
> produce an
> over-all significance test? 
> 
> Is this making any sense?  Am I interpreting all this properly?  Any
> thoughts on whether this is really what I should be doing?
> 
> Thank you for your help!!!!
> 
> Sincerely,
> Andy
> 
> ***********************************************************
> Andy Cooper
> Fisheries Conservation Biologist
> National Audubon Society, Living Oceans Program
> 550 South Bay Ave.
> Islip, New York 11751
> (631) 859-1588
> FAX: (631) 581-5268
> acooper@audubon.org
> 
> --------------------------------------------------------------
> ---------
> This message was distributed by s-news@wubios.wustl.edu.  To 
> unsubscribe
> send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
> message:  unsubscribe s-news
> 
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>
  • SUMMARY of RE: [S] Mixed Effects model clarification, COOPER, Andy <=