On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification

Authors:
Isaac Triguero;José A. Sáez;Julián Luengo;Salvador García;Francisco Herrera
Affiliations:
-;-;-;-;-
Venue:
Neurocomputing
Year:
2014

Citing 48
Cited 0

Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
A new definition of neighborhood of a point in multi-dimensional space

Pattern Recognition Letters
Prototype selection for the nearest neighbour rule through proximity graphs

Pattern Recognition Letters
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Analysis of new techniques to obtain quality training sets

Pattern Recognition Letters - Special issue: Sibgrapi 2001
Experiments with Noise Filtering in a Medical Domain

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning from Labeled and Unlabeled Data using Graph Mincuts

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Identifying and Handling Mislabelled Instances

Journal of Intelligent Information Systems
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

The Journal of Machine Learning Research
A lot of randomness is hiding in accuracy

Engineering Applications of Artificial Intelligence
Improving software quality prediction by noise filtering techniques

Journal of Computer Science and Technology
Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle

IEEE Transactions on Pattern Analysis and Machine Intelligence
KEEL: a software tool to assess evolutionary algorithms for data mining problems

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability

Soft Computing - A Fusion of Foundations, Methodologies and Applications
Nearest neighbor editing aided by unlabeled data

Information Sciences: an International Journal
The Top Ten Algorithms in Data Mining

The Top Ten Algorithms in Data Mining
Learning from labeled and unlabeled data: an empirical study across techniques and domains

Journal of Artificial Intelligence Research
Introduction to Semi-Supervised Learning

Introduction to Semi-Supervised Learning
Introduction to Machine Learning

Introduction to Machine Learning
Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power

Information Sciences: an International Journal
Co-training with relevant random subspaces

Neurocomputing
Semi-supervised learning based on nearest neighbor rule and cut edges

Knowledge-Based Systems
Semi-Supervised Learning

Semi-Supervised Learning
Question classification based on co-training style semi-supervised learning

Pattern Recognition Letters
A self-trained ensemble with semisupervised SVM: An application to pixel classification of remote sensing imagery

Pattern Recognition
Semi-Supervised Learning via Regularized Boosting Working on Multiple Semi-Supervised Assumptions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Semi-supervised classification and betweenness computation on large, sparse, directed graphs

Pattern Recognition
Sparse Semi-supervised Learning Using Conjugate Functions

The Journal of Machine Learning Research
Data Mining: Practical Machine Learning Tools and Techniques

Data Mining: Practical Machine Learning Tools and Techniques
When Does Cotraining Work in Real Data?

IEEE Transactions on Knowledge and Data Engineering
Identifying mislabeled training data with the aid of unlabeled data

Applied Intelligence
Dimensionality reduction based on non-parametric mutual information

Neurocomputing
Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study

IEEE Transactions on Pattern Analysis and Machine Intelligence
A stochastic approach to wilson's editing algorithm

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
SETRED: self-training with editing

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
DCPE co-training for classification

Neurocomputing
A multiple kernel framework for inductive semi-supervised SVM learning

Neurocomputing
Considerations about sample-size sensitivity of a family of editednearest-neighbor rules

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Mining With Noise Knowledge: Error-Aware Data Mining

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Nearest neighbor pattern classification

IEEE Transactions on Information Theory
Algorithms of fuzzy clustering with partial supervision

Pattern Recognition Letters

Quantified Score

Hi-index	0.01

Visualization

Abstract

Semi-supervised classification methods have received much attention as suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. Several semi-supervised learning models have been proposed with different assumptions about the characteristics of the input data. Among them, the self-training process has emerged as a simple and effective technique, which does not require any specific hypotheses about the training data. Despite its effectiveness, the self-training algorithm usually make erroneous predictions, mainly at the initial stages, if noisy examples are labeled and incorporated into the training set. Noise filters are commonly used to remove corrupted data in standard classification. In 2005, Li and Zhou proposed the addition of a statistical filter to the self-training process. Nevertheless, in this approach, filtering methods have to deal with a reduced number of labeled instances and the erroneous predictions it may induce. In this work, we analyze the integration of a wide variety of noise filters into the self-training process to distinguish the most relevant features of filters. We will focus on the nearest neighbor rule as a base classifier and ten different noise filters. We provide an extensive analysis of the performance of these filters considering different ratios of labeled data. The results are contrasted with nonparametric statistical tests that allow us to identify relevant filters, and their main characteristics, in the field of semi-supervised learning.