A boosting approach to remove class label noise
International Journal of Hybrid Intelligent Systems - Hybrid Intelligent systems in Ensembles
MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Hi-index | 0.00 |
Noisy data is inherent in many real-life and industrial modelling situations. If prior knowledge of such data was available, it would be a simple process to remove or account for noise and improve model robustness. Unfortunately, in the majority of learning situations, the presence of underlying noise is suspected but difficult to detect.Ensemble classification techniques such as bagging, (Breiman, 1996a), boosting (Freund & Schapire, 1997) and arcing algorithms (Breiman, 1997) have received much attention in recent literature. Such techniques have been shown to lead to reduced classification error on unseen cases, and this paper demonstrates that they may also be employed as noise detectors. Recently defined diagnostics such as edge and margin (Breiman, 1997; Freund & Schapire, 1997; Schapire et al., 1998) have been used to explain the improvements made in generalisation error when ensemble classifiers are built. The distributions of these measures are key in the noise detection process introduced in this study.This paper presents some empirical results on edge distributions which confirm exisiting theories on boosting's tendency to 'balance' error rates. The results are then extended to introduce a methodology whereby boosting may be used to identify noise in training data by examining the changes in edge and margin distributions as boosting proceeds.