Class Noise Mitigation Through Instance Weighting

Authors:
Umaa Rebbapragada;Carla E. Brodley
Affiliations:
Dept. of Computer Science, Tufts University, 161 College Ave., Medford, MA 02155, USA;Dept. of Computer Science, Tufts University, 161 College Ave., Medford, MA 02155, USA
Venue:
ECML '07 Proceedings of the 18th European conference on Machine Learning
Year:
2007

Citing 11
Cited 8

Estimating a Kernel Fisher Discriminant in the Presence of Label Noise

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Correcting Noisy Data

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Experiments with Noise Filtering in a Medical Domain

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois

ALT '96 Proceedings of the 7th International Workshop on Algorithmic Learning Theory
Improving Classification by Removing or Relabeling Mislabeled Instances

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Identifying and Handling Mislabelled Instances

Journal of Intelligent Information Systems
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
An algorithm for correcting mislabeled data

Intelligent Data Analysis
Ensemble methods for noise elimination in classification problems

MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
Identifying and eliminating mislabeled training instances

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Application-Independent Feature Construction from Noisy Samples

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Automatic image semantic interpretation using social action and tagging data

Multimedia Tools and Applications
Automatically building training examples for entity extraction

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Surface Sulfur Detection via Remote Sensing and Onboard Classification

ACM Transactions on Intelligent Systems and Technology (TIST)
A noise-detection based AdaBoost algorithm for mislabeled data

Pattern Recognition
Information enhancement for data mining

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
An ontology enhanced parallel SVM for scalable spam filter training

Neurocomputing
Repeated labeling using multiple noisy labelers

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a novel framework for class noise mitigation that assigns a vector of class membership probabilities to each training instance, and uses the confidence on the current label as a weight during training. The probability vector should be calculated such that clean instances have a high confidence on its current label, while mislabeled instances have a low confidence on its current label and a high confidence on its correct label. Past research focuses on techniques that either discard or correct instances. This paper proposes that discarding and correcting are special cases of instance weighting, and thus, part of this framework. We propose a method that uses clustering to calculate a probability distribution over the class labels for each instance. We demonstrate that our method improves classifier accuracy over the original training set. We also demonstrate that instance weighting can outperform discarding.