Outlier detection by example

Authors:
Cui Zhu;Hiroyuki Kitagawa;Spiros Papadimitriou;Christos Faloutsos
Affiliations:
College of Computer Science, Beijing University of Technology, Beijing, People's Republic of China 100124;Graduate School of Systems and Information Engineering, Center for Computational Sciences, University of Tsukuba, Ibaraki, Japan 305-8577;IBM T.J. Watson, Hawthorne, USA;Carnegie Mellon University, Pittsburgh, USA
Venue:
Journal of Intelligent Information Systems
Year:
2011

Citing 18
Cited 0

Robust regression and outlier detection

Robust regression and outlier detection
Data clustering: a review

ACM Computing Surveys (CSUR)
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Support vector domain description

Pattern Recognition Letters - Special issue on pattern recognition in practice VI
On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
SVM binary classifier ensembles for image classification

Proceedings of the tenth international conference on Information and knowledge management
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
Example-Based Robust Outlier Detection in High Dimensional Datasets

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Detecting outliers using transduction and statistical testing

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outlier detection is a useful technique in such areas as fraud detection, financial analysis and health monitoring. Many recent approaches detect outliers according to reasonable, pre-defined concepts of an outlier (e.g., distance-based, density-based, etc.). However, the definition of an outlier differs between users or even datasets. This paper presents a solution to this problem by including input from the users. Our OBE (Outlier By Example) system is the first that allows users to provide examples of outliers in low-dimensional datasets. By incorporating a small number of such examples, OBE can successfully develop an algorithm by which to identify further outliers based on their outlierness. Several algorithmic challenges and engineering decisions must be addressed in building such a system. We describe the key design decisions and algorithms in this paper. In order to interact with users having different degrees of domain knowledge, we develop two detection schemes: OBE-Fraction and OBE-RF. Our experiments on both real and synthetic datasets demonstrate that OBE can discover values that a user would consider outliers.