Handwritten digit recognition with a back-propagation network
Advances in neural information processing systems 2
Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
MadaBoost: A Modification of AdaBoost
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Boosting as a Regularized Path to a Maximum Margin Classifier
The Journal of Machine Learning Research
Robust Loss Functions for Boosting
Neural Computation
Model-shared subspace boosting for multi-label classification
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Avoiding Boosting Overfitting by Removing Confusing Samples
ECML '07 Proceedings of the 18th European conference on Machine Learning
ODDboost: Incorporating Posterior Estimates into AdaBoost
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Modelling and Simulation in Engineering
Hi-index | 0.03 |
Several authors have suggested viewing boosting as a gradient descent search for a good fit in function space. At each iteration observations are re-weighted using the gradient of the underlying loss function. We present an approach of weight decay for observation weights which is equivalent to "robustifying" the underlying loss function. At the extreme end of decay this approach converges to Bagging, which can be viewed as boosting with a linear underlying loss function. We illustrate the practical usefulness of weight decay for improving prediction performance and present an equivalence between one form of weight decay and "Huberizing" --- a statistical method for making loss functions more robust.