C4.5: programs for machine learning
C4.5: programs for machine learning
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
Machine Learning
Distance-based outliers: algorithms and applications
The VLDB Journal — The International Journal on Very Large Data Bases
Identifying and eliminating mislabeled training instances
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Separability Index in Supervised Learning
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Identifying and Handling Mislabelled Instances
Journal of Intelligent Information Systems
Class Noise Mitigation Through Instance Weighting
ECML '07 Proceedings of the 18th European conference on Machine Learning
Support Vector Machine for Outlier Detection in Breast Cancer Survivability Prediction
Advanced Web and NetworkTechnologies, and Applications
Genre-based decomposition of email class noise
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Relabeling algorithm for retrieval of noisy instances and improving prediction quality
Computers in Biology and Medicine
Hi-index | 0.00 |
It is common that a database contains noisy data. An important source of noise consists in mislabeled training instances. We present a new approach that deals with improving classification accuracies in such a case by using a preliminary filtering procedure. An example is suspect when in its neighborhood defined by a geometrical graph the proportion of examples of the same class is not significantly greater than in the whole database. Such suspect examples in the training data can be removed or relabeled. The filtered training set is then provided as input to learning algorithm. Our experiments on ten benchmarks of UCI Machine Learning Repository using 1-NN as the final algorithm show that removing give better results than relabeling. Removing allows maintaining the generalization error rate when we introduce from 0 to 20% of noise on the class, especially when classes are well separable.