提升方法

提升方法（Boosting），是一种可以用来减小監督式學習中偏差的机器学习算法。面對的问题是邁可·肯斯（Michael Kearns）提出的：[1]一組“弱学习者”的集合能否生成一个“强学习者”？弱学习者一般是指一个分类器，它的结果只比随机分类好一点点；强学习者指分类器的结果非常接近真值。

提升算法

大多数提升算法包括由迭代使用弱学习分類器組成，並將其結果加入一個最終的成强学习分類器。加入的过程中，通常根据它们的分类准确率给予不同的权重。加和弱学习者之后，数据通常会被重新加权，来强化对之前分类错误数据点的分类。

一个经典的提升算法例子是AdaBoost。一些最近的例子包括LPBoost、TotalBoost、BrownBoost、MadaBoost及LogitBoost。许多提升方法可以在AnyBoost框架下解释为在函数空间利用一个凸的误差函数作梯度下降。

2008年，谷歌的菲利普·隆（Phillip Long）與哥倫比亞大學的羅可·A·瑟維迪歐（Rocco A. Servedio）发表论文指出这些方法是有缺陷的：在训练集有错误的标记的情况下，一些提升算法雖會尝试提升这种样本点的正确率，但卻無法产生一个正确率大于1/2的模型。[2]

Orange, a free data mining software suite, module Orange.ensemble
Weka is a machine learning set of tools that offers variate implementations of boosting algorithms like AdaBoost and LogitBoost
R package GBM (Generalized Boosted Regression Models) implements extensions to Freund and Schapire's AdaBoost algorithm and Friedman's gradient boosting machine.
jboost; AdaBoost, LogitBoost, RobustBoost, Boostexter and alternating decision trees

Yoav Freund and Robert E. Schapire (1997); A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting, Journal of Computer and System Sciences, 55(1):119-139
Robert E. Schapire and Yoram Singer (1999); Improved Boosting Algorithms Using Confidence-Rated Predictors, Machine Learning, 37(3):297-336

Robert E. Schapire (2003); The Boosting Approach to Machine Learning: An Overview, MSRI (Mathematical Sciences Research Institute) Workshop on Nonlinear Estimation and Classification
An up-to-date collection of papers on boosting

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.