s-news
[Top] [All Lists]

discriminant function follow-up

To: s-news@lists.biostat.wustl.edu
Subject: discriminant function follow-up
From: Lisa Crozier <lcrozier@uchicago.edu>
Date: Mon, 22 Mar 2004 10:44:34 -0600
Hi folks,
        Apparently several other people have also had trouble doing dfa with S+, so I am posting this follow-up to the list.  I did not receive any solutions to the code problems. 
        I did receive advice from multiple people to use CART (regression tree analysis) instead of DFA.  I have attached the reasons for this below and the link to the required software for S+. 

I propose that you use a different approach to model your data: CART classification trees (you can do this in S-Plus with Terry Therneau's RPART library http://www.stats.ox.ac.uk/pub/MASS3/Winlibs/). Classification trees are better in my opinion at solving your questions:
1) they do automatic selection of the most important explanatory variables and inform you about interactions among them, so you do not need to do a stepwise analysis.
2) the trees they produce are easy to interpret ecologically, and are based on untransformed variables.
3) they accept quantitative as well as qualitative variables, and you do not need to worry about scaling them
And you do not need limit your sites to those close to the range edge, although you want to consider prior probabilities in setting your model.
I have attached a paper that goes in more details.

Vayssières, M., Plant, R., Allen-Diaz, B. 2000.  Classification trees: an alternative non-parametric approach for predicting species distributions.
Journal of Vegetation Science 11: 679-694


ORIGINAL POST
At 03:22 PM 3/10/2004 -0600, you wrote:
I am trying to generate a dfa model of my data.  My goal is to identify key climatic features associated with the geographical range limit in a particular species.  The data consists of the presence/absence of a species and environmental characteristics (temperature and ppt of each site).  I tried to use code from Venables & Ripley, Modern Appl. Stat, but it doesn't work in S+6.1.  I am trying to do these things:
1) figure out whether S+ does stepwise discriminant analysis (or how to do this), or if it just uses all of the variables entered
2)  graph the output to visually evaluate the fit (i.e., Figure 11.10 in V&R, the 1st and 2nd DF on the axes), rather than generate a huge number of plots of all the untransformed variables which is the default plot function
3) determine whether it is a problem that the predictor variables are in different units (e.g., temperature and precipitation) or does it automatically scale them such that the coefficients represent the correct weight in the function? 

A more statistical question is whether I should limit the sites considered in the model to the sites very close to the range edge, so that the function is not biased by an arbitrarily large number of sites well outside the range.  If so, how do I do this objectively?

Lisa Crozier
Dept Biology, Mailstop 351800
University of Washington
Seattle, WA 98195
206-543-4859 (lab)
206-616-2011 (fax)

<Prev in Thread] Current Thread [Next in Thread>
  • discriminant function follow-up, Lisa Crozier <=