Capabilities of outlier detection schemes in large datasets, framework and methodologies

Authors:
Jian Tang;Zhixiang Chen;Ada Waichee Fu;David W. Cheung
Affiliations:
Memorial University of Newfoundland,St. John's, Department of Computer Science, Newfoundland, Canada;University of Texas-Pan American Edinburgh, Department of Computer Science, Texas, Newfoundland, USA;Chinese University of Hong Kong, Department of Computer Science and Engineering, Shatin, Newfoundland, Hong Kong;University of Hong Kong, Department of Computer Science and Information Systems, Pokfulam, Newfoundland, Hong Kong
Venue:
Knowledge and Information Systems
Year:
2006

Citing 0
Cited 7

Outlier Detection with Explanation Facility

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Subspace and projected clustering: experimental evaluation and analysis

Knowledge and Information Systems
Outlier Detection with a Hybrid Artificial Intelligence Method

MICAI '09 Proceedings of the 8th Mexican International Conference on Artificial Intelligence
Enhancing effectiveness of density-based outlier mining scheme with density-similarity-neighbor-based outlier factor

Expert Systems with Applications: An International Journal
Incremental connectivity-based outlier factor algorithm

VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
Robust image annotation via simultaneous feature and sample outlier pursuit

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Review: A review of novelty detection

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outlier detection is concerned with discovering exceptional behaviors of objects. Its theoretical principle and practical implementation lay a foundation for some important applications such as credit card fraud detection, discovering criminal behaviors in e-commerce, discovering computer intrusion, etc. In this paper, we first present a unified model for several existing outlier detection schemes, and propose a compatibility theory, which establishes a framework for describing the capabilities for various outlier formulation schemes in terms of matching users'intuitions. Under this framework, we show that the density-based scheme is more powerful than the distance-based scheme when a dataset contains patterns with diverse characteristics. The density-based scheme, however, is less effective when the patterns are of comparable densities with the outliers. We then introduce a connectivity-based scheme that improves the effectiveness of the density-based scheme when a pattern itself is of similar density as an outlier. We compare density-based and connectivity-based schemes in terms of their strengths and weaknesses, and demonstrate applications with different features where each of them is more effective than the other. Finally, connectivity-based and density-based schemes are comparatively evaluated on both real-life and synthetic datasets in terms of recall, precision, rank power and implementation-free metrics.