s-news
[Top] [All Lists]

Re: Skewed multinomial data

To: "'Brian R. Mitchell'" <brian.mitchell@uvm.edu>, "'s-news@lists.biostat.wustl.edu'" <s-news@lists.biostat.wustl.edu>
Subject: Re: Skewed multinomial data
From: "Raubertas, Richard" <richard_raubertas@merck.com>
Date: Tue, 14 Dec 2004 13:22:20 -0500
You don't say what kind of "parametric analysis" (model) you
plan to use, but I'll assume it is a standard regression 
analysis and that you really want (as you say) to use the
cover types as covariates, not the response variable.  In
that case:

Nothing about the standard regression model assumes the
*covariates* have a normal distribution.  (A corollary is
that the model does not even assume that the marginal 
distribution of the response variable is normal.)  In fact 
the model assumes that *conditional on fixed values* of the 
covariates, the response has a normal distribution.  Why
this myth about distribution of covariates persists is beyond me.

The model does assume that the effects of the covariates on
the response are linear and additive.  My suggestion is to
use %developed and %grassland as your parameterization
of cover type.  For your data these will mostly be small
numbers, and the question is, is the effect (on the response)
of going from 0% to 1% the same as going from 5% to 6% (say)?
If not, you can consider transformations to make the effect
on response linear, or perhaps fit a nonparametric 
smooth curve to model the effect.  Note that if only a small
proportion of observations have values other than (0, 0)
for the covariates, the amount of information for estimating
the effects of cover type will be limited.  In that case,
dichotomizing each covariate as 0/not-0 might do as well as
anything.

Rich Raubertas
Merck & Co.

> -----Original Message-----
> From: s-news-owner@lists.biostat.wustl.edu 
> [mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of 
> Brian R. Mitchell
> Sent: Tuesday, December 14, 2004 9:25 AM
> To: s-news@lists.biostat.wustl.edu
> Subject: [S] Skewed multinomial data
> 
> 
> Apologies for posting a general stats question on a 
> software-specific list, 
> but I have not been able to find any other active stats lists...
> 
> I am hoping to use the proportion of different landscape cover types 
> (forest, developed, grassland) in the vicinity of my sampling 
> locations as 
> covariates in an analysis.  The three cover types sum to 100% 
> most of the 
> time (certain classes like water were excluded) and never exceed 
> 100%.  Because of collinearity, I would only use 2 of the 3 
> cover types in 
> the analysis.  But I am stymied by the severe skew in this 
> data: more than 
> half the sites were > 95% forested, and a very large number were 100% 
> forested.  I understand that the usual procedure for 
> multinomial data is an 
> arcsin transform to normalize the data; in this case the 
> transformation 
> helps but the data is still severely skewed.
> 
> Has anyone out there dealt with this sort of problem?  I feel like my 
> options are 1) transform the habitat data into one or two 
> binary categories 
> (and lose information in the data); 2) use the data as-is 
> (and risk severe 
> violations of the normality assumption for the analysis); or 
> 3) transform 
> the data in a way that will normalize it.
> 
> I'd be interested in any ideas on transformations that might 
> help, as well 
> as any thoughts on just how big a problem severely skewed 
> data would be for 
> a parametric analysis (and whether categorizing the data 
> might be "safer").
> 
> Thanks!
> 
> Brian
> 
> --------------------------------------------------------------
> ----------------------------
> Brian R. Mitchell
> Post-Doctoral Associate
> University of Vermont
> The Rubenstein School of Environment and Natural Resources
> 81 Carrigan Drive
> Burlington, VT  05405-0088
> (802) 656-2496
> Brian.Mitchell@uvm.edu
> --------------------------------------------------------------
> ---------------------------- 
> 
> 
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
> 
> 


------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments, contains 
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New 
Jersey, USA 08889), and/or its affiliates (which may be known outside the 
United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as 
Banyu) that may be confidential, proprietary copyrighted and/or legally 
privileged. It is intended solely for the use of the individual or entity named 
on this message.  If you are not the intended recipient, and have received this 
message in error, please notify us immediately by reply e-mail and then delete 
it from your system.
------------------------------------------------------------------------------

<Prev in Thread] Current Thread [Next in Thread>