Thanks to Rolf Tuner, David Paul and Jim Garret for their replies.
Rolf Turner and Jim Garret suggest that I look at "The Statistical
Analysis of Compositional Data" by J. Aitchison, 1987, Chapman & Hall.
The book is out of print but I found an article with the same title
by the same author in J.R. Statist. Soc. B (1982) 44(2):139-177.
David Paul suggested that I look at the Box-Cox transformation:
>See the paper by Jack Tubbs, early 1980's, Communications in
> Statistics - B. (Sorry, I don't have an exact reference.)
I reproduce below Jim Garret's most complete and helpfull reply.
Thank you again.
Gabriel Baud-Bovy
>Hello Gabriel,
>
>Saw your message on the S e-mail list. I get the daily digest, so it's
>easier for me to reply directly.
>
>There's a good book on analysis of compositional data, _The Statistical
>Analysis of Compositional Data_ by J. Aitchison, 1987, Chapman & Hall,
>ISBN: 0412280604. Unfortunately Amazon.com says it is out of print. I
>think the gist of it is that if you have K components in your composition,
>then if you transform the proportions using a "generalized logistic"
>transformation, you get K - 1 unconstrained variables which you can
>reasonably model using Gaussian methods. That is, by working with means
>and covariance(s) on the transformed scale, you have a very rich
>distributional model on the original scale.
>
>The transformation, which I'm sure has been discussed by others, is as
>follows: let p_1, ..., p_k be your K proportions in the composition. Let
>
> xi_1 = log(p_1 / p_k), (by "xi" I mean the Greek
>letter)
> xi_2 = log(p_2 / p_k),
> ...
> x_{k-1} = log(p_{k-1} / p_k)
>
>(If you wanted to go that last step, you'd find that x_k = log(p_k / p_k)
>= 1.) The inverse to this transformation is
>
> p_1 = exp(xi_1) / (1 + sum_{j=1}^{k-1} exp(xi_j) )
>
>though you may not need the inverse transformation.
>
>In principle the component that you identify with k, which will form the
>"baseline" to which all others are compared, is arbitrary. You will get
>the same results regardless of which is the baseline. However, if one of
>your components is never near zero, it might be wise for numerical reasons
>to use that one as the baseline.
>
>At any rate, with this transformation you have (xi_1, ..., xi_{k-1})
>unconstrained inputs to which you could apply discriminant analysis or any
>other of your favorite classification techniques.
>
>As far as S-Plus examples, the transformation itself is very simple. As
>for applying the discriminant analysis, there's always Modern Applied
>Statistics with S-Plus by Venables and Ripley, to name one, and there are
>probably others. Linear discriminant analysis is implemented as the
>function "lda" in the MASS library which accompanies V&R's book, at least
>for S-Plus 2000 for Windows (perhaps newer versions come with this library
>pre-installed). The help page for "lda" has an example.
>
>Cheers,
>
>-Jim Garrett
>
|