A linear wrapper for sample subset selection in atypical detection

  • Authors:
  • Saeed Hashemi

  • Affiliations:
  • Faculty of Computer Science, Dalhousie University, 6050 University Ave., Halifax, B3H 1W5, Canada. Tel.: +1 902 494 6441

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Subset selection with a wrapper approach to identify atypical examples can be preferable to a filter approach (which may not be consistent with the classifier in use) but its running time is prohibitive. The fastest available wrappers are quadratic in the number of examples, which is far too expensive for sample subset selection. The presented approach is a linear wrapper method that is roughly 80 times faster than the quadratic wrappers. Atypical points are defined in this paper as the misclassified points that the proposed algorithm (Atypical Sequential Ranking: ASR) finds not useful to the classification task. They may include both outliers and overlapping samples. ASR can identify and rank atypical points in the whole dataset without damaging the prediction accuracy. It is general enough that classifiers without reject option can use it. Experiments on 20 benchmark datasets and 5 classifiers show promising results and confirm that this wrapper method has some advantages and can be used in sample subset selection for atypical detection.