Instance-Based Learning Algorithms
Machine Learning
C4.5: programs for machine learning
C4.5: programs for machine learning
A new definition of neighborhood of a point in multi-dimensional space
Pattern Recognition Letters
Prototype selection for the nearest neighbour rule through proximity graphs
Pattern Recognition Letters
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
Analysis of new techniques to obtain quality training sets
Pattern Recognition Letters - Special issue: Sibgrapi 2001
Experiments with Noise Filtering in a Medical Domain
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning from Labeled and Unlabeled Data using Graph Mincuts
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Identifying and Handling Mislabelled Instances
Journal of Intelligent Information Systems
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers
IEEE Transactions on Knowledge and Data Engineering
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples
The Journal of Machine Learning Research
A lot of randomness is hiding in accuracy
Engineering Applications of Artificial Intelligence
Improving software quality prediction by noise filtering techniques
Journal of Computer Science and Technology
IEEE Transactions on Pattern Analysis and Machine Intelligence
KEEL: a software tool to assess evolutionary algorithms for data mining problems
Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
Soft Computing - A Fusion of Foundations, Methodologies and Applications
Nearest neighbor editing aided by unlabeled data
Information Sciences: an International Journal
The Top Ten Algorithms in Data Mining
The Top Ten Algorithms in Data Mining
Learning from labeled and unlabeled data: an empirical study across techniques and domains
Journal of Artificial Intelligence Research
Introduction to Semi-Supervised Learning
Introduction to Semi-Supervised Learning
Introduction to Machine Learning
Introduction to Machine Learning
Information Sciences: an International Journal
Co-training with relevant random subspaces
Neurocomputing
Semi-supervised learning based on nearest neighbor rule and cut edges
Knowledge-Based Systems
Semi-Supervised Learning
Question classification based on co-training style semi-supervised learning
Pattern Recognition Letters
Semi-Supervised Learning via Regularized Boosting Working on Multiple Semi-Supervised Assumptions
IEEE Transactions on Pattern Analysis and Machine Intelligence
Sparse Semi-supervised Learning Using Conjugate Functions
The Journal of Machine Learning Research
Data Mining: Practical Machine Learning Tools and Techniques
Data Mining: Practical Machine Learning Tools and Techniques
When Does Cotraining Work in Real Data?
IEEE Transactions on Knowledge and Data Engineering
Identifying mislabeled training data with the aid of unlabeled data
Applied Intelligence
Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study
IEEE Transactions on Pattern Analysis and Machine Intelligence
A stochastic approach to wilson's editing algorithm
IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
SETRED: self-training with editing
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
DCPE co-training for classification
Neurocomputing
Considerations about sample-size sensitivity of a family of editednearest-neighbor rules
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Mining With Noise Knowledge: Error-Aware Data Mining
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Nearest neighbor pattern classification
IEEE Transactions on Information Theory
Algorithms of fuzzy clustering with partial supervision
Pattern Recognition Letters
Hi-index | 0.01 |
Semi-supervised classification methods have received much attention as suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. Several semi-supervised learning models have been proposed with different assumptions about the characteristics of the input data. Among them, the self-training process has emerged as a simple and effective technique, which does not require any specific hypotheses about the training data. Despite its effectiveness, the self-training algorithm usually make erroneous predictions, mainly at the initial stages, if noisy examples are labeled and incorporated into the training set. Noise filters are commonly used to remove corrupted data in standard classification. In 2005, Li and Zhou proposed the addition of a statistical filter to the self-training process. Nevertheless, in this approach, filtering methods have to deal with a reduced number of labeled instances and the erroneous predictions it may induce. In this work, we analyze the integration of a wide variety of noise filters into the self-training process to distinguish the most relevant features of filters. We will focus on the nearest neighbor rule as a base classifier and ten different noise filters. We provide an extensive analysis of the performance of these filters considering different ratios of labeled data. The results are contrasted with nonparametric statistical tests that allow us to identify relevant filters, and their main characteristics, in the field of semi-supervised learning.