Identifying mislabeled training data with the aid of unlabeled data

Authors:
Donghai Guan;Weiwei Yuan;Young-Koo Lee;Sungyoung Lee
Affiliations:
College of Automation, Harbin Engineering University, Harbin, China 150001;Dept. of Computer Engineering, Kyung Hee University, Yongin-si, Korea 446-701;Dept. of Computer Engineering, Kyung Hee University, Yongin-si, Korea 446-701;Dept. of Computer Engineering, Kyung Hee University, Yongin-si, Korea 446-701
Venue:
Applied Intelligence
Year:
2011

Citing 11
Cited 4

Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Discovering informative patterns and data cleaning

Advances in knowledge discovery and data mining
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
An Empirical Comparison of Pruning Methods for Decision Tree Induction

Machine Learning
Induction of Decision Trees

Machine Learning
Learning From Noisy Examples

Machine Learning
Experiments with Noise Filtering in a Medical Domain

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning subjective nouns using extraction pattern bootstrapping

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Improvements to Platt's SMO Algorithm for SVM Classifier Design

Neural Computation
Identifying and eliminating mislabeled training instances

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

An Improved Model of Trust-aware Recommender Systems Using Distrust Metric

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Image retrieval based on augmented relational graph representation

Applied Intelligence
Class imbalance and the curse of minority hubs

Knowledge-Based Systems
On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new approach for identifying and eliminating mislabeled training instances for supervised learning algorithms. The novelty of this approach lies in the using of unlabeled instances to aid the detection of mislabeled training instances. This is in contrast with existing methods which rely upon only the labeled training instances. Our approach is straightforward and can be applied to many existing noise detection methods with only marginal modifications on them as required. To assess the benefit of our approach, we choose two popular noise detection methods: majority filtering (MF) and consensus filtering (CF). MFAUD/CFAUD is the new proposed variant of MF/CF which relies on our approach and denotes majority/consensus filtering with the aid of unlabeled data. Empirical study validates the superiority of our approach and shows that MFAUD and CFAUD can significantly improve the performances of MF and CF under different noise ratios and labeled ratios. In addition, the improvement is more remarkable when the noise ratio is greater.