Relaxation labelling algorithms-a review
Image and Vision Computing
Algorithms for clustering data
Algorithms for clustering data
Instance-Based Learning Algorithms
Machine Learning
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
Machine Learning
Separability Index in Supervised Learning
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Improving Classification by Removing or Relabeling Mislabeled Instances
ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Distance-based outliers: algorithms and applications
The VLDB Journal — The International Journal on Very Large Data Bases
Identifying and eliminating mislabeled training instances
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers
IEEE Transactions on Knowledge and Data Engineering
Avoiding Boosting Overfitting by Removing Confusing Samples
ECML '07 Proceedings of the 18th European conference on Machine Learning
Class Noise Mitigation Through Instance Weighting
ECML '07 Proceedings of the 18th European conference on Machine Learning
Support Vector Machine for Outlier Detection in Breast Cancer Survivability Prediction
Advanced Web and NetworkTechnologies, and Applications
Improving classification accuracy using automatically extracted training data
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Relabeling algorithm for retrieval of noisy instances and improving prediction quality
Computers in Biology and Medicine
Semi-supervised learning based on nearest neighbor rule and cut edges
Knowledge-Based Systems
Edited AdaBoost by weighted kNN
Neurocomputing
A new co-training-style random forest for computer aided diagnosis
Journal of Intelligent Information Systems
Using semi-supervised learning for question classification
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
SETRED: self-training with editing
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A bootstrapping method for learning from heterogeneous data
FGIT'11 Proceedings of the Third international conference on Future Generation Information Technology
A novel inductive semi-supervised SVM with graph-based self-training
IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Hi-index | 0.01 |
Data mining and knowledge discovery aim at producing useful and reliable models from the data. Unfortunately some databases contain noisy data which perturb the generalization of the models. An important source of noise consists of mislabelled training instances. We offer a new approach which deals with improving classification accuracies by using a preliminary filtering procedure. An example is suspect when in its neighbourhood defined by a geometrical graph the proportion of examples of the same class is not significantly greater than in the database itself. Such suspect examples in the training data can be removed or relabelled. The filtered training set is then provided as input to learning algorithms. Our experiments on ten benchmarks of UCI Machine Learning Repository using 1-NN as the final algorithm show that removal gives better results than relabelling. Removing allows maintaining the generalization error rate when we introduce from 0 to 20% of noise on the class, especially when classes are well separable. The filtering method proposed is finally compared to the relaxation relabelling schema.