Linear-Time Wrappers to Identify Atypical Points: Two Subset Generation Methods

Authors:
Saeed Hashemi
Affiliations:
-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2005

Citing 9
Cited 0

A Review and Empirical Evaluation of Feature Weighting Methods for aClass of Lazy Learning Algorithms

Artificial Intelligence Review - Special issue on lazy learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Data mining: concepts and techniques

Data mining: concepts and techniques
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Conversational Case-Based Reasoning

Applied Intelligence
Boosting the margin: A new explanation for the effectiveness of voting methods

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Error detection and impact-sensitive instance ranking in noisy datasets

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

The wrapper approach to identify atypical examples can be preferable to the filter approach (which may not be consistent with the classifier in use), but its running time is prohibitive. The fastest available wrappers are quadratic in the number of examples, which is far too expensive for atypical detection. The algorithm presented in this paper is a linear-time wrapper that is roughly 75 times faster than the quadratic wrappers on average over 7 classifiers and 20 data sets tested in this research. Also, two subset generation methods for the wrapper are introduced and compared. Atypical points are defined in this paper as the misclassified points that the proposed algorithm (Atypical Sequential Removing: ASR) finds not useful to the classification task. They may include outliers as well as overlapping samples. ASR can identify and rank atypical points in the whole data set without damaging the prediction accuracy. It is general enough that classifiers without reject option can use it. Experiments on benchmark data sets and different classifiers show promising results and confirm that this wrapper method has some advantages and can be used for atypical detection.