s-news
[Top] [All Lists]

Re: [S] Summary of: efficiency of lmRobMM()

To: Lorenz Gygax <lgygax@access.unizh.ch>, s-news@wubios.wustl.edu
Subject: Re: [S] Summary of: efficiency of lmRobMM()
From: Doug Martin <doug@statsci.com>
Date: Mon, 26 Jul 1999 09:26:51 -0700
In-reply-to: <Pine.LNX.3.96.990726083002.302A-100000@iamgygax.unizh.ch>
Sender: owner-s-news@wubios.wustl.edu
At 08:31 AM 7/26/99 +0200, Lorenz Gygax wrote:
>
>
>Dear all,
>
>Here a summary of the responses that I got to the following problem:
>
>****************** original problem ***********************************
>
>I am currently trying to run a couple of robust models in Splus
>Version 5.0 Release 2 for Sun SPARC, SunOS 5.5 with the function
>lmRobMM () with the final aim of comparing these models with the
>function anova(,test="RF").
>
>One of the smaller models that I am trying to run has about 450 cases
>and uses one factor as the explanatory variable (with almost 40
>levels; the larger models have one to six additional continuous
>variables). The calculation of this model is running now for almost 20
>hours of CPU time [it is up to over 100 hours by now].
>
>Does anybody else have experience with the efficiency of this
>function, any idea whether this is normal behaviour and I just need to
>be patient or on how to speed things up?
>
>***********************************************************************
>
>Most answers recommended to move to Splus 5.1 and see whether the
>general increase in efficiency also helps with my problem (Bert Gunter
><bert_gunter@merck.com>, Brian Ripley <ripley@stats.ox.ac.uk>, Sylvia
>Isler <sisler@statsci.com>). I am currently trying to locate our
>shipment of version 5.1 and may soon report on my experience regarding
>lmRobMM.
>
>Doug Martin <doug@statsci.com> provided some additional thoughts and
>hints:
>
>1.  With 40 levels, you have in effect p = 40 (dummy) variables.  The
>default resampling algorithm is set at 4.6*2^p which is 5.058e+12 for
>p = 40.  This default rule provides a high breakdown point (BP = .5)
>with probability .999.  You can choose to use fewer samples.  But then
>you lose this high probability of high breakdown point.  The details
>may be found in Section 3 of Yohai, Stahel and Zamar (1991) - see
>the Bibliography of the On-Line User Manual Supplement for 4.5 (or
>equivalent for UNIX) for the source of this reference.  Perhaps we
>can provide the details via email on Monday, and check a bit to 
>see how many samples are required for lower probabilities such as
>.9, etc.
>
>2.  Another possiblity is to try the genetic algorithm instead of the
>resampling algorithm, experimenting with the algorithm parameters.  I do
>not believe there are any high-probability of high-breakdown point
>properties for the genetic algorithm.  But some people believe it works
>well (a study we did several years ago with a small number of variables

**  More precisely, a study that Pat Burns did.  And I may have recalled
    it somewhat incorrectly, as on second thought maybe the genetic
    algorithm did better - at least for Pat's specific study.  Sorry for
    that.  Maybe Pat can provide an accurate summary if he is reading this.
    Else, I could look at the report again.

    Doug

>showed that it was very similar to the resampling method). In any event,
>though highly desirable, high-breakdown point is not a be-all and end-all.
>
>3.  On UNIX S-PLUS 5.1 is faster than 5.0.
>
>4.  More importantly:  For models with (some) factor variables there is a
>much better algorithm than the current resampling algorithm, due to
>Maronna and Yohai (submitted, but not yet published).  It turns out that
>we already started implementing the algorithm, and hope to have a beta
>version soon.  We regard this as a very important improvement to lmRobMM,
>and although it does not solve your problem today, I hope you might want
>to be a beta tester as soon as the new version is available?
>
>5.  Finally for regression with many variables, e.g., 50 or more, there
>is another "fast" algorithm described by Pena and Yohai in JASA, that
>we will also implement soon.
>
>
>Thank you all for your generous help and advice! I will let you know
>when and how I succeeded to solve the problem.
>--                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>Lorenz Gygax             LGygax@amath.unizh.ch;       room: 36-L-40
>                         Department of Applied Mathematics
>                         University of Zuerich-Irchel
>                         Winterthurerstr. 190; CH-8057 Zurich
>                         voice: 41-1-635-58-52  fax: 41-1-635-57-05
>                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>-----------------------------------------------------------------------
>This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
>send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
>message:  unsubscribe s-news
>
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>