Using Boosting to Detect Noisy Data

Authors:
Virginia Wheway
Affiliations:
-
Venue:
Revised Papers from the PRICAI 2000 Workshop Reader, Four Workshops held at PRICAI 2000 on Advances in Artificial Intelligence
Year:
2000

Citing 0
Cited 2

A boosting approach to remove class label noise

International Journal of Hybrid Intelligent Systems - Hybrid Intelligent systems in Ensembles
Automatic labeling inconsistencies detection and correction for sentence unit segmentation in conversational speech

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Noisy data is inherent in many real-life and industrial modelling situations. If prior knowledge of such data was available, it would be a simple process to remove or account for noise and improve model robustness. Unfortunately, in the majority of learning situations, the presence of underlying noise is suspected but difficult to detect.Ensemble classification techniques such as bagging, (Breiman, 1996a), boosting (Freund & Schapire, 1997) and arcing algorithms (Breiman, 1997) have received much attention in recent literature. Such techniques have been shown to lead to reduced classification error on unseen cases, and this paper demonstrates that they may also be employed as noise detectors. Recently defined diagnostics such as edge and margin (Breiman, 1997; Freund & Schapire, 1997; Schapire et al., 1998) have been used to explain the improvements made in generalisation error when ensemble classifiers are built. The distributions of these measures are key in the noise detection process introduced in this study.This paper presents some empirical results on edge distributions which confirm exisiting theories on boosting's tendency to 'balance' error rates. The results are then extended to introduce a methodology whereby boosting may be used to identify noise in training data by examining the changes in edge and margin distributions as boosting proceeds.