Performance Analysis of Class Noise Detection Algorithms

Authors:
Borut Sluban;Dragan Gamberger;Nada Lavrač
Affiliations:
Jožef Stefan International Postgraduate School, Ljubljana, Slovenia;Rudjer Bošković Institute, Zagreb, Croatia;Jožef Stefan Institute, Ljubljana, Slovenia and University of Nova Gorica, Nova Gorica, Slovenia
Venue:
Proceedings of the 2010 conference on STAIRS 2010: Proceedings of the Fifth Starting AI Researchers' Symposium
Year:
2010

Citing 10
Cited 3

Learning decision rules in noisy domains

Proceedings of Expert Systems '86, The 6Th Annual Technical Conference on Research and development in expert systems III
Simplifying decision trees

International Journal of Man-Machine Studies - Special Issue: Knowledge Acquisition for Knowledge-based Systems. Part 5
Pruning Algorithms for Rule Learning

Machine Learning
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
An Empirical Comparison of Pruning Methods for Decision Tree Induction

Machine Learning
Conditions for Occam's Razor Applicability and Noise Elimination

ECML '97 Proceedings of the 9th European Conference on Machine Learning
Correcting Noisy Data

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Class noise vs. attribute noise: a quantitative study of their impacts

Artificial Intelligence Review
Evaluating noise elimination techniques for software quality estimation

Intelligent Data Analysis
Active subgroup mining: a case study in coronary heart disease risk group detection

Artificial Intelligence in Medicine

Evaluating outliers for cross-context link discovery

AIME'11 Proceedings of the 13th conference on Artificial intelligence in medicine
Exploring the power of outliers for cross-domain literature mining

Bisociative Knowledge Discovery
Ensemble-based noise detection: noise ranking and visual performance evaluation

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

In real-world datasets noisy instances and outliers require special attention of domain experts. While noise filtering algorithms are usually used to improve the accuracy of induced classification models, our aim is to detect noisy instances to be inspected by human experts in the phase of data understanding, data cleaning and outlier detection. As a result, new algorithms for explicit noise detection have been developed aiming at highest possible precision of noise detection within a reasonable recall threshold. The best performing noise detection algorithms are therefore selected based on a variant of the F-measure combining precision and recall. We use the F0.5-score, which weights precision twice as much as recall. New variants of ensemble noise filtering approaches to noise detection, using a consensus voting scheme, have been developed. They proved to be significantly better than elementary noise filters in supporting the domain expert at identifying potential outliers and/or erroneous data instances.