Apologies for posting a general stats question on a software-specific list,
but I have not been able to find any other active stats lists...
I am hoping to use the proportion of different landscape cover types
(forest, developed, grassland) in the vicinity of my sampling locations as
covariates in an analysis. The three cover types sum to 100% most of the
time (certain classes like water were excluded) and never exceed
100%. Because of collinearity, I would only use 2 of the 3 cover types in
the analysis. But I am stymied by the severe skew in this data: more than
half the sites were > 95% forested, and a very large number were 100%
forested. I understand that the usual procedure for multinomial data is an
arcsin transform to normalize the data; in this case the transformation
helps but the data is still severely skewed.
Has anyone out there dealt with this sort of problem? I feel like my
options are 1) transform the habitat data into one or two binary categories
(and lose information in the data); 2) use the data as-is (and risk severe
violations of the normality assumption for the analysis); or 3) transform
the data in a way that will normalize it.
I'd be interested in any ideas on transformations that might help, as well
as any thoughts on just how big a problem severely skewed data would be for
a parametric analysis (and whether categorizing the data might be "safer").
Thanks!
Brian
------------------------------------------------------------------------------------------
Brian R. Mitchell
Post-Doctoral Associate
University of Vermont
The Rubenstein School of Environment and Natural Resources
81 Carrigan Drive
Burlington, VT 05405-0088
(802) 656-2496
Brian.Mitchell@uvm.edu
------------------------------------------------------------------------------------------
|