Improving Classification by Removing or Relabeling Mislabeled Instances

Authors:
Stéphane Lallich;Fabrice Muhlenbach;Djamel A. Zighed
Affiliations:
-;-;-
Venue:
ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Year:
2002

Citing 5
Cited 8

C4.5: programs for machine learning

C4.5: programs for machine learning
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Induction of Decision Trees

Machine Learning
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
Identifying and eliminating mislabeled training instances

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Separability Index in Supervised Learning

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Identifying and Handling Mislabelled Instances

Journal of Intelligent Information Systems
Learning topology of a labeled data set with the supervised generative Gaussian graph

Neurocomputing
Class Noise Mitigation Through Instance Weighting

ECML '07 Proceedings of the 18th European conference on Machine Learning
Support Vector Machine for Outlier Detection in Breast Cancer Survivability Prediction

Advanced Web and NetworkTechnologies, and Applications
Genre-based decomposition of email class noise

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
High-dimensional labeled data analysis with topology representing graphs

Neurocomputing
Relabeling algorithm for retrieval of noisy instances and improving prediction quality

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is common that a database contains noisy data. An important source of noise consists in mislabeled training instances. We present a new approach that deals with improving classification accuracies in such a case by using a preliminary filtering procedure. An example is suspect when in its neighborhood defined by a geometrical graph the proportion of examples of the same class is not significantly greater than in the whole database. Such suspect examples in the training data can be removed or relabeled. The filtered training set is then provided as input to learning algorithm. Our experiments on ten benchmarks of UCI Machine Learning Repository using 1-NN as the final algorithm show that removing give better results than relabeling. Removing allows maintaining the generalization error rate when we introduce from 0 to 20% of noise on the class, especially when classes are well separable.