A Test to Determine the Multivariate Normality of a Data Set
IEEE Transactions on Pattern Analysis and Machine Intelligence
C4.5: programs for machine learning
C4.5: programs for machine learning
Prototype selection for the nearest neighbour rule through proximity graphs
Pattern Recognition Letters
Pattern classification with compact distribution maps
Computer Vision and Image Understanding
Complexity Measures of Supervised Classification Problems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Analysis of new techniques to obtain quality training sets
Pattern Recognition Letters - Special issue: Sibgrapi 2001
Experiments with Noise Filtering in a Medical Domain
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Feature subset selection using a new definition of classifiability
Pattern Recognition Letters
Pretopological Approach for Supervised Learning
ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume IV-Volume 7472 - Volume 7472
On the Nonlinearity of Pattern Classifiers
ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume IV-Volume 7472 - Volume 7472
Multiresolution Estimates of Classification Complexity
IEEE Transactions on Pattern Analysis and Machine Intelligence
Class Noise vs. Attribute Noise: A Quantitative Study
Artificial Intelligence Review
Using AUC and Accuracy in Evaluating Learning Algorithms
IEEE Transactions on Knowledge and Data Engineering
Data complexity assessment in undersampled classification of high-dimensional biomedical data
Pattern Recognition Letters
Data Complexity in Pattern Recognition (Advanced Information and Knowledge Processing)
Data Complexity in Pattern Recognition (Advanced Information and Knowledge Processing)
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
An analysis of how training data complexity affects the nearest neighbor classifiers
Pattern Analysis & Applications
Pattern Classifier Design by Linear Programming
IEEE Transactions on Computers
Improving software quality prediction by noise filtering techniques
Journal of Computer Science and Technology
Evolutionary rule-based systems for imbalanced data sets
Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
Artificial Intelligence in Medicine
Machine Learning and Data Mining: Introduction to Principles and Algorithms
Machine Learning and Data Mining: Introduction to Principles and Algorithms
Probably correct k-nearest neighbor search in high dimensions
Pattern Recognition
Fast exact k nearest neighbors search using an orthogonal search tree
Pattern Recognition
Ensemble methods for noise elimination in classification problems
MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
Nearest neighbour group-based classification
Pattern Recognition
Information Sciences: an International Journal
Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Intelligent Systems, Design and Applications (ISDA 2009)
A stochastic approach to wilson's editing algorithm
IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary Fuzzy Systems
Domain of competence of XCS classifier system in complexity measurement space
IEEE Transactions on Evolutionary Computation
Mining With Noise Knowledge: Error-Aware Data Mining
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Nearest neighbor pattern classification
IEEE Transactions on Information Theory
Domains of competence of the semi-naive Bayesian network classifiers
Information Sciences: an International Journal
Hi-index | 0.01 |
Classifier performance, particularly of instance-based learners such as k-nearest neighbors, is affected by the presence of noisy data. Noise filters are traditionally employed to remove these corrupted data and improve the classification performance. However, their efficacy depends on the properties of the data, which can be analyzed by what are known as data complexity measures. This paper studies the relation between the complexity metrics of a dataset and the efficacy of several noise filters to improve the performance of the nearest neighbor classifier. A methodology is proposed to extract a rule set based on data complexity measures that enables one to predict in advance whether the use of noise filters will be statistically profitable. The results obtained show that noise filtering efficacy is to a great extent dependent on the characteristics of the data analyzed by the measures. The validation process carried out shows that the final rule set provided is fairly accurate in predicting the efficacy of noise filters before their application and it produces an improvement with respect to the indiscriminate usage of noise filters.