A boosting approach to remove class label noise

Authors:
Amitava Karmaker;Stephen Kwek
Affiliations:
(Correspd. akarmake@cs.utsa.edu) Department of Computer Science, University of Texas at San Antonio, TX 78249, USA;Department of Computer Science, University of Texas at San Antonio, TX 78249, USA
Venue:
International Journal of Hybrid Intelligent Systems - Hybrid Intelligent systems in Ensembles
Year:
2006

Citing 13
Cited 1

Bagging predictors

Machine Learning
Discovering informative patterns and data cleaning

Advances in knowledge discovery and data mining
Boosting as entropy projection

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Induction of Decision Trees

Machine Learning
Some Theoretical Aspects of Boosting in the Presence of Noisy Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Data Squashing for Speeding Up Boosting-Based Outlier Detection

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Using Boosting to Detect Noisy Data

Revised Papers from the PRICAI 2000 Workshop Reader, Four Workshops held at PRICAI 2000 on Advances in Artificial Intelligence
Boosting Noisy Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Boosting in the presence of noise

Journal of Computer and System Sciences - Special issue: Learning theory 2003
Boosting with averaged weight vectors

MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
An empirical comparison of three boosting algorithms on real data sets with artificial class noise

MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
Identifying and eliminating mislabeled training instances

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

A boosting method for process fault detection with detection delay reduction and label denoising

Proceedings of the First International Workshop on Data Mining for Service and Maintenance

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ensemble methods have been known to improve the prediction accuracy over the base learning algorithms. AdaBoost is well-recognized for this in its class. However, it is susceptible to overfitting the training instances corrupted by class label noise. This paper proposes a modification of AdaBoost that is more tolerant to class label noise, which further enhances its ability to boost the prediction accuracy. Particularly, we observe that in Adaboost, the weight-hike of noisy examples can be constrained by careful application of a cut-off in their weights. We study the characteristics of our technique empirically using some artificially generated data set. We also corroborate this on a number of data sets from UCI repository [1]. In both experimental settings, the results obtained affirm the efficiency of our approach. Finally, some of the significant characteristics of our technique related to noisy environments have been investigated.