Estimating a Kernel Fisher Discriminant in the Presence of Label Noise
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Experiments with Noise Filtering in a Medical Domain
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois
ALT '96 Proceedings of the 7th International Workshop on Algorithmic Learning Theory
Improving Classification by Removing or Relabeling Mislabeled Instances
ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Identifying and Handling Mislabelled Instances
Journal of Intelligent Information Systems
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
An algorithm for correcting mislabeled data
Intelligent Data Analysis
Ensemble methods for noise elimination in classification problems
MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
Identifying and eliminating mislabeled training instances
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Application-Independent Feature Construction from Noisy Samples
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Automatic image semantic interpretation using social action and tagging data
Multimedia Tools and Applications
Automatically building training examples for entity extraction
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Surface Sulfur Detection via Remote Sensing and Onboard Classification
ACM Transactions on Intelligent Systems and Technology (TIST)
A noise-detection based AdaBoost algorithm for mislabeled data
Pattern Recognition
Information enhancement for data mining
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Repeated labeling using multiple noisy labelers
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
We describe a novel framework for class noise mitigation that assigns a vector of class membership probabilities to each training instance, and uses the confidence on the current label as a weight during training. The probability vector should be calculated such that clean instances have a high confidence on its current label, while mislabeled instances have a low confidence on its current label and a high confidence on its correct label. Past research focuses on techniques that either discard or correct instances. This paper proposes that discarding and correcting are special cases of instance weighting, and thus, part of this framework. We propose a method that uses clustering to calculate a probability distribution over the class labels for each instance. We demonstrate that our method improves classifier accuracy over the original training set. We also demonstrate that instance weighting can outperform discarding.