Performance Analysis of Class Noise Detection Algorithms

  • Authors:
  • Borut Sluban;Dragan Gamberger;Nada Lavrač

  • Affiliations:
  • Jožef Stefan International Postgraduate School, Ljubljana, Slovenia;Rudjer Bošković Institute, Zagreb, Croatia;Jožef Stefan Institute, Ljubljana, Slovenia and University of Nova Gorica, Nova Gorica, Slovenia

  • Venue:
  • Proceedings of the 2010 conference on STAIRS 2010: Proceedings of the Fifth Starting AI Researchers' Symposium
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In real-world datasets noisy instances and outliers require special attention of domain experts. While noise filtering algorithms are usually used to improve the accuracy of induced classification models, our aim is to detect noisy instances to be inspected by human experts in the phase of data understanding, data cleaning and outlier detection. As a result, new algorithms for explicit noise detection have been developed aiming at highest possible precision of noise detection within a reasonable recall threshold. The best performing noise detection algorithms are therefore selected based on a variant of the F-measure combining precision and recall. We use the F0.5-score, which weights precision twice as much as recall. New variants of ensemble noise filtering approaches to noise detection, using a consensus voting scheme, have been developed. They proved to be significantly better than elementary noise filters in supporting the domain expert at identifying potential outliers and/or erroneous data instances.