s-news
[Top] [All Lists]

[S] Summary: anova drops off variables

To: S-News <s-news@wubios.wustl.edu>
Subject: [S] Summary: anova drops off variables
From: "W. Keith Moser" <4ester@compuserve.com>
Date: Thu, 28 May 1998 07:03:48 -0400
Sender: owner-s-news@wubios.wustl.edu
S-Newsers:

I received several good replies from my question about why ANOVA (aov)
drops off variables.  All spoke to collinearity of cmpt and the "dropped
variables."

**Alan Zaslavsky wrote:
try print.lm(arst.aov) to see the coefficients.  I suspect that cmpt is a
factor that includes the levels of the variables that fell out of the
model.  e.g. if you had a factor "black" and a factor "white" and another
one "color" with levels black, white, red, orange, green, then if you put
"color" in the model ahead of the other two factors, the latter two factors
are redundant and get dropped out

First, I tried print().

#pepa (used below) and arst (used in my initial post) are different
species, but the S-plus effects are the same.

> print(pepa.aov)
Call:
   aov(formula = pepa ~ cmpt + basum + numgrowbrn + season + lastburn,
na.action = na.omit)

Terms:
                    cmpt    basum Residuals 
 Sum of Squares  18.5088   4.9856  385.3390
Deg. of Freedom        5        1       143

Residual standard error: 1.641547 
3 out of 10 effects not estimable
Estimated effects may be unbalanced

I wasn't sure this was telling me why there was a problem.

**John Wallace suggested:
The missing ones must be highly correlated or co-linear with cmpt.  Try
using summary() on the aov() output.

So, I tried summary().  It did not provide different output from anova() in
this case.

> summary(pepa.aov)
           Df Sum of Sq  Mean Sq  F Value     Pr(F) 
     cmpt   5   18.5088 3.701752 1.373727 0.2376347
    basum   1    4.9856 4.985609 1.850169 0.1759042
Residuals 143  385.3390 2.694678                   

**Brian Ripley wrote:
Those variables are aliased with cmpt, that is only take constant values
within each level of cmpt, at least when cmpt is not missing (as it seems
to be in 2 cases[actually, Prof. Ripley, the residuals differs only because
the there are different numbers of variables - the variables and residuals
Df all add up to 149]). Try print or summary, which will tell you about
aliasing, and look at alias too.

So, I tried alias (a function which I did not immediately find in the
manuals).  The output confirmed Professor Ripley's (and everyone else's)
suggestions.

> alias(pepa.aov)
Model 
pepa ~ cmpt + basum + numgrowbrn + season + lastburn

Complete 
           (Intercept) cmpt1 cmpt2 cmpt3 cmpt4 cmpt5 basum 
  lastburn  9          -3    -1     1     2    -1         
    season  4          -6     2     1     1     1         
numgrowbrn  5          -3     3    -2     1     1         

Partial 
            (I) c1 c2 c3 c4 c5  b 
(Intercept)      1  1  7  3  3 -9
      cmpt1        -2 -1 -1 -1 -1
      cmpt2            1  1  1 -1
      cmpt3               2  2 -7
      cmpt4                  1 -3
      cmpt5                    -3
      basum                      

Notes:
$"Max. Abs. Corr.":
[1] 0.964


The output is logical, in that  cmpt  is "compartment" a management unit
where different sequences of prescribed fire (fire is a "hot" topic in
Florida - Georgia right now) are practiced, resulting in different season
(season of burn), numgrowbrn (number of growing season burns) and lastburn
(years since last burn) characteristics.  I had initially tried to use
cor() on these variables but, being categorical, I did not get any answer
(see below).

> cor(numgrowbrn, season)
Error in .C("S_Var2_NA",: There are 150 missing value(s) in x and/or y
passed to cor or var with na.method="fail".  See the help file for other
options for handling missing values.
Dumped
Warning messages:
  150 missing values generated coercing from character to numeric in:
as.double(y)

> cor(season, cmpt)
Error in .C("S_Var2_NA",: There are 300 missing value(s) in x and/or y
passed to cor or var with na.method="fail".  See the help file for other
options for handling missing values.
Dumped
Warning messages:
1: 150 missing values generated coercing from character to numeric in:
as.double(x)
2: 150 missing values generated coercing from character to numeric in:
as.double(y)

The values were _not_ missing.

One final question: When various regressions, anovas, etc. calculate Df and
pull out one of the categories in a variable [I can't think of the
technical term, sorry]  (for example, the study had six cmpts, but the
various output tables list only five), which one do they take - the first
one or the last one?  When I see various output tables, I am wondering to
which cmpts or spp (species) they are referring to.

Thanks to all for your help.

W. Keith Moser, D.F.
Ecological Forestry Research Scientist
Tall Timbers Research Station
Route 1, Box 678
Tallahassee FL  32312-9712  USA
tel:    +001 850 893-4153 ext 247
fax:    +001 850 668-7781
email:  4ester@compuserve.com

-------------------------------------------------------------------
On Wed, 27 May 1998, W. Keith Moser wrote:

> S-Newsers
> 
> I have a ridiculously simple question, but I cannot find the answer in 
the
> S-plus 4.5 documentation.
> 
> I have a data set where I am examining the percent cover of particular
> species of plants.  I have categorical variables (cmpt, numgrowbrn,
> lastburn, season) and numerical variables (basum).
> 
> I ran two anovas:
> 
> > arst.aov <- aov(arst ~ cmpt + basum + numgrowbrn + season + lastburn,
> na.action = na.omit)
> > anova(arst.aov)
> Analysis of Variance Table
> 
> Response: arst
> 
> Terms added sequentially (first to last)
>            Df Sum of Sq  Mean Sq  F Value     Pr(F) 
>      cmpt   5   179.420 35.88396 1.639962 0.1531631
>     basum   1    48.415 48.41475 2.212641 0.1390859
> Residuals 143  3128.979 21.88098                   
> > arst.nocmpt.aov <- aov(arst ~ basum + numgrowbrn + season + lastburn,
> na.action = na.omit)
> > anova(arst.nocmpt.aov)
> Analysis of Variance Table
> 
> Response: arst
> 
> Terms added sequentially (first to last)
>             Df Sum of Sq  Mean Sq  F Value     Pr(F) 
>      basum   1    16.799 16.79910 0.753463 0.3868164
> numgrowbrn   1    43.061 43.06106 1.931348 0.1667411
>     season   1    62.069 62.06851 2.783858 0.0973765
>   lastburn   1     1.986  1.98568 0.089061 0.7658021
>  Residuals 145  3232.900 22.29586                   
> 
> 
> QUESTION:  Why did the first anova drop off numgrowbrn, season and
lastburn?
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>
  • [S] Summary: anova drops off variables, W. Keith Moser <=