Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification

Authors:
José A. SáEz;JuliáN Luengo;Francisco Herrera
Affiliations:
Department of Computer Science and Artificial Intelligence, University of Granada, CITIC-UGR, Granada 18071, Spain;Department of Civil Engineering, LSI, University of Burgos, Burgos 09006, Spain;Department of Computer Science and Artificial Intelligence, University of Granada, CITIC-UGR, Granada 18071, Spain
Venue:
Pattern Recognition
Year:
2013

Citing 39
Cited 2

A Test to Determine the Multivariate Normality of a Data Set

IEEE Transactions on Pattern Analysis and Machine Intelligence
C4.5: programs for machine learning

C4.5: programs for machine learning
Prototype selection for the nearest neighbour rule through proximity graphs

Pattern Recognition Letters
Pattern classification with compact distribution maps

Computer Vision and Image Understanding
Complexity Measures of Supervised Classification Problems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Analysis of new techniques to obtain quality training sets

Pattern Recognition Letters - Special issue: Sibgrapi 2001
Experiments with Noise Filtering in a Medical Domain

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Feature subset selection using a new definition of classifiability

Pattern Recognition Letters
Pretopological Approach for Supervised Learning

ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume IV-Volume 7472 - Volume 7472
On the Nonlinearity of Pattern Classifiers

ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume IV-Volume 7472 - Volume 7472
Multiresolution Estimates of Classification Complexity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Class Noise vs. Attribute Noise: A Quantitative Study

Artificial Intelligence Review
Using AUC and Accuracy in Evaluating Learning Algorithms

IEEE Transactions on Knowledge and Data Engineering
Data complexity assessment in undersampled classification of high-dimensional biomedical data

Pattern Recognition Letters
Data Complexity in Pattern Recognition (Advanced Information and Knowledge Processing)

Data Complexity in Pattern Recognition (Advanced Information and Knowledge Processing)
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Classification in the presence of class noise using a probabilistic Kernel Fisher method

Pattern Recognition
An analysis of how training data complexity affects the nearest neighbor classifiers

Pattern Analysis & Applications
Pattern Classifier Design by Linear Programming

IEEE Transactions on Computers
Improving software quality prediction by noise filtering techniques

Journal of Computer Science and Technology
Evolutionary rule-based systems for imbalanced data sets

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors

Artificial Intelligence in Medicine
Machine Learning and Data Mining: Introduction to Principles and Algorithms

Machine Learning and Data Mining: Introduction to Principles and Algorithms
Domains of competence of fuzzy rule based classification systems with data complexity measures: A case of study using a fuzzy hybrid genetic based machine learning method

Fuzzy Sets and Systems
Probably correct k-nearest neighbor search in high dimensions

Pattern Recognition
An SVM classifier incorporating simultaneous noise reduction and feature selection: illustrative case examples

Pattern Recognition
Fast exact k nearest neighbors search using an orthogonal search tree

Pattern Recognition
Ensemble methods for noise elimination in classification problems

MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
Nearest neighbour group-based classification

Pattern Recognition
Building binary-tree-based multiclass classifiers using separability measures

Neurocomputing
Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification

Pattern Recognition
Analysis of complexity indices for classification problems: Cancer gene expression data

Neurocomputing
Shared domains of competence of approximate learning models using measures of separability of classes

Information Sciences: an International Journal
Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Intelligent Systems, Design and Applications (ISDA 2009)
A stochastic approach to wilson's editing algorithm

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Fuzzy knowledge representation study for incremental learning in data streams and classification problems

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary Fuzzy Systems
Domain of competence of XCS classifier system in complexity measurement space

IEEE Transactions on Evolutionary Computation
Mining With Noise Knowledge: Error-Aware Data Mining

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Nearest neighbor pattern classification

IEEE Transactions on Information Theory

EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling

Pattern Recognition
Domains of competence of the semi-naive Bayesian network classifiers

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Classifier performance, particularly of instance-based learners such as k-nearest neighbors, is affected by the presence of noisy data. Noise filters are traditionally employed to remove these corrupted data and improve the classification performance. However, their efficacy depends on the properties of the data, which can be analyzed by what are known as data complexity measures. This paper studies the relation between the complexity metrics of a dataset and the efficacy of several noise filters to improve the performance of the nearest neighbor classifier. A methodology is proposed to extract a rule set based on data complexity measures that enables one to predict in advance whether the use of noise filters will be statistically profitable. The results obtained show that noise filtering efficacy is to a great extent dependent on the characteristics of the data analyzed by the measures. The validation process carried out shows that the final rule set provided is fairly accurate in predicting the efficacy of noise filters before their application and it produces an improvement with respect to the indiscriminate usage of noise filters.