s-news
[Top] [All Lists]

Classification of compositional data [summary]

To: s-news@lists.biostat.wustl.edu
Subject: Classification of compositional data [summary]
From: Gabriel Baud-Bovy <gabriel@shaker.med.umn.edu>
Date: Tue, 25 Jun 2002 11:29:54 +0200
Thanks to Rolf Tuner, David Paul and Jim Garret for their replies.

Rolf Turner and Jim Garret suggest that I look at "The Statistical
Analysis of Compositional Data" by J. Aitchison, 1987, Chapman & Hall.
The book is out of print but I found an article with the same title
by the same author in J.R. Statist. Soc. B (1982) 44(2):139-177.

David Paul suggested that I look at the Box-Cox transformation:
>See the paper by Jack Tubbs, early 1980's, Communications in 
> Statistics - B. (Sorry, I don't have an exact reference.)

I reproduce below Jim Garret's most complete and helpfull reply.

Thank you again.

Gabriel Baud-Bovy


>Hello Gabriel,
>
>Saw your message on the S e-mail list.  I get the daily digest, so it's
>easier for me to reply directly.
>
>There's a good book on analysis of compositional data, _The Statistical
>Analysis of Compositional Data_ by J. Aitchison, 1987, Chapman & Hall,
>ISBN: 0412280604.  Unfortunately Amazon.com says it is out of print.  I
>think the gist of it is that if you have K components in your composition,
>then if you transform the proportions using a "generalized logistic"
>transformation, you get K - 1 unconstrained variables which you can
>reasonably model using Gaussian methods.  That is, by working with means
>and covariance(s) on the transformed scale, you have a very rich
>distributional model on the original scale.
>
>The transformation, which I'm sure has been discussed by others, is as
>follows:  let p_1, ..., p_k be your K proportions in the composition.  Let
>
>     xi_1 = log(p_1 / p_k),                (by "xi" I mean the Greek
>letter)
>     xi_2 = log(p_2 / p_k),
>     ...
>     x_{k-1} = log(p_{k-1} / p_k)
>
>(If you wanted to go that last step, you'd find that x_k = log(p_k / p_k)
>= 1.)  The inverse to this transformation is
>
>     p_1 = exp(xi_1) / (1 + sum_{j=1}^{k-1} exp(xi_j) )
>
>though you may not need the inverse transformation.
>
>In principle the component that you identify with k, which will form the
>"baseline" to which all others are compared, is arbitrary.  You will get
>the same results regardless of which is the baseline.  However, if one of
>your components is never near zero, it might be wise for numerical reasons
>to use that one as the baseline.
>
>At any rate, with this transformation you have (xi_1, ..., xi_{k-1})
>unconstrained inputs to which you could apply discriminant analysis or any
>other of your favorite classification techniques.
>
>As far as S-Plus examples, the transformation itself is very simple.  As
>for applying the discriminant analysis, there's always Modern Applied
>Statistics with S-Plus by Venables and Ripley, to name one, and there are
>probably others.  Linear discriminant analysis is implemented as the
>function "lda" in the MASS library which accompanies V&R's book, at least
>for S-Plus 2000 for Windows (perhaps newer versions come with this library
>pre-installed).  The help page for "lda" has an example.
>
>Cheers,
>
>-Jim Garrett
>


<Prev in Thread] Current Thread [Next in Thread>
  • Classification of compositional data [summary], Gabriel Baud-Bovy <=