s-news
[Top] [All Lists]

[S] tech report on boosting

To: s-news@wubios.wustl.edu
Subject: [S] tech report on boosting
From: tibs@utstat.toronto.edu
Date: Thu, 23 Jul 98 21:47 EDT
Sender: owner-s-news@wubios.wustl.edu
                   *** Technical Report Available ***


    Additive Logistic Regression: a Statistical View of Boosting

                           Jerome Friedman
                       (jhf@stat.stanford.edu) 

                            Trevor Hastie
                      (trevor@stat.stanford.edu)


                          Robert Tibshirani
                      (tibs@utstat.toronto.edu)


                               ABSTRACT 

Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of
the most important recent developments in classification
methodology. The performance of many classification algorithms often
can be dramatically improved by sequentially applying them to
reweighted versions of the input data, and taking a weighted majority
vote of the sequence of classifiers thereby produced. We show that
this seemingly mysterious phenomenon can be understood in terms of
well known statistical principles, namely additive modeling and
maximum likelihood.  For the two-class problem, boosting can be viewed
as an approximation to additive modeling on the logistic scale using
maximum Bernoulli likelihood as a criterion. We develop more direct
approximations and show that they exhibit nearly identical results to
that of boosting. Direct multi-class generalizations based on
multinomial likelihood are derived that exhibit performance comparable
to other recently proposed multi-class generalizations of boosting in
most situations, and far superior in some.  We suggest a minor
modification to boosting that can reduce computation, often by factors
of 10 to 50. Finally, we apply these insights to produce an
alternative formulation of boosting decision trees. This approach,
based on best-first truncated tree induction, often leads to better
performance, and can provide interpretable descriptions of the
aggregate decision rule. It is also much faster computationally making
it more suitable to large scale data mining applications.

Available by ftp from:

ftp://stat.stanford.edu/pub/friedman/boost.ps.Z

or in 

www://utstat.toronto.edu/tibs/research.html

or 

ftp://utstat.toronto.edu/pub/tibs/boost.ps.Z

Comments welcome.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Rob Tibshirani, Dept of Public Health Sciences, and Dept of Statistics
Univ of Toronto, Toronto, Canada M5S 1A8.
Phone: 416-978-4642 (PMB), 416-978-0673 (stats). FAX: 416 978-8299
computer fax  416-978-1525 (please call or email me to inform)
tibs@utstat.toronto.edu. ftp: //utstat.toronto.edu/pub/tibs
http://www.utstat.toronto.edu/~tibs
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

              Another turning point, a fork stuck in the road.
            Time grabs you by the wrist, directs you where to go.
              So make the best of this test, and don't ask why.
             It's not a question, but a lesson learned in time.
          It's something unpredictable, but in the end is right.
                    I hope you had the time of your life.

           So take the photographs, and still frames in your mind.
               Hang it on shelf of good health and good time.
                 Tattoos of memories and dead skin on trial.
              For what it's worth, it was worth all the while.
                    I hope you had the time of your life.
 
Green Day
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>
  • [S] tech report on boosting, tibs <=