Mining top-n local outliers in large databases

Authors:
Wen Jin;Anthony K. H. Tung;Jiawei Han
Affiliations:
Simon Fraser University, Burnaby, B.C., Canada;Simon Fraser University, Burnaby, B.C., Canada;Simon Fraser University, Burnaby, B.C., Canada
Venue:
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2001

Citing 7
Cited 58

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

A survey on wavelet applications in data mining

ACM SIGKDD Explorations Newsletter
Outlier Detection Algorithms in Data Mining Systems

Programming and Computing Software
Classifying large data sets using SVMs with hierarchical clusters

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Framework for mining web content outliers

Proceedings of the 2004 ACM symposium on Applied computing
Network flow for outlier detection

ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
Mining web content outliers using structure oriented weighting techniques and N-grams

Proceedings of the 2005 ACM symposium on Applied computing
Detection and prediction of distance-based outliers

Proceedings of the 2005 ACM symposium on Applied computing
Distance-Based Detection and Prediction of Outliers

IEEE Transactions on Knowledge and Data Engineering
Detecting outliers using transduction and statistical testing

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining distance-based outliers from large databases in any metric space

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding centric local outliers in categorical/numerical spaces

Knowledge and Information Systems
Web outlier mining: Discovering outliers from web datasets

Intelligent Data Analysis
Data bubbles for non-vector data: speeding-up hierarchical clustering in arbitrary metric spaces

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Hos-Miner: a system for detecting outlyting subspaces of high-dimensional data

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Very efficient mining of distance-based outliers

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Detecting distance-based outliers in streams of data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A genetic approach for efficient outlier detection in projected space

Pattern Recognition
CURIO: a fast outlier and outlier cluster detection algorithm for large datasets

AIDM '07 Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining - Volume 84
Angle-based outlier detection in high-dimensional data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier detection using default reasoning

Artificial Intelligence
A Comparative Study of Unsupervised Machine Learning and Data Mining Techniques for Intrusion Detection

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Mining Top-n Local Outliers in Constrained Spatial Networks

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets

ACM Transactions on Knowledge Discovery from Data (TKDD)
Finding anomalous periodic time series

Machine Learning
Parameterless outlier detection in data streams

Proceedings of the 2009 ACM symposium on Applied Computing
A Multi-resolution Approach for Atypical Behaviour Mining

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
On efficient mutual nearest neighbor query processing in spatial databases

Data & Knowledge Engineering
Efficient anomaly monitoring over moving object trajectory streams

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A comprehensive survey of numeric and symbolic outlier mining techniques

Intelligent Data Analysis
Multivariate similarity-based conformity measure (MSCM): an outlier detection measure for data mining applications

AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
Distance-based outlier queries in data streams: the novel task and algorithms

Data Mining and Knowledge Discovery
HOT: hypergraph-based outlier test for categorical data

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Correlation-based detection of attribute outliers

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Efficiently mining regional outliers in spatial data

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Cell-based outlier detection algorithm: a fast outlier detection algorithm for large datasets

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Rough-based semi-supervised outlier detection

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Semi-supervised outlier detection based on fuzzy rough C-means clustering

Mathematics and Computers in Simulation
Detecting outliers on arbitrary data streams using anytime approaches

Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques
Enhancing effectiveness of density-based outlier mining scheme with density-similarity-neighbor-based outlier factor

Expert Systems with Applications: An International Journal
Spatial outlier detection: random walk based approaches

Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems
Atypicity detection in data streams: A self-adjusting approach

Intelligent Data Analysis - Ubiquitous Knowledge Discovery
Active learning and subspace clustering for anomaly detection

Intelligent Data Analysis
RKOF: robust kernel-based local outlier detection

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
A novel outlier detection method for spatio-tempral trajectory data

ICHIT'11 Proceedings of the 5th international conference on Convergence and hybrid information technology
Mining outliers in spatial networks

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Ranking outliers using symmetric neighborhood relationship

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Grid-ODF: detecting outliers effectively and efficiently in large multi-dimensional databases

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Hybrid approach to web content outlier mining without query vector

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
An optimization model for outlier detection in categorical data

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Mining uncertain data streams using clustering feature decision trees

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
A minimum spanning tree-inspired clustering-based outlier detection technique

ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
MultiAspectForensics: mining large heterogeneous networks using tensor

International Journal of Web Engineering and Technology
Outlier ensembles: position paper

ACM SIGKDD Explorations Newsletter
Subsampling for efficient and effective unsupervised outlier detection ensembles

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outlier detection is an important task in data mining with numerous applications, including credit card fraud detection, video surveillance, etc. A recent work on outlier detection has introduced a novel notion of local outlier in which the degree to which an object is outlying is dependent on the density of its local neighborhood, and each object can be assigned a Local Outlier Factor (LOF) which represents the likelihood of that object being an outlier. Although the concept of local outliers is a useful one, the computation of LOF values for every data objects requires a large number of &kgr;-nearest neighbors searches and can be computationally expensive. Since most objects are usually not outliers, it is useful to provide users with the option of finding only n most outstanding local outliers, i.e., the top-n data objects which are most likely to be local outliers according to their LOFs. However, if the pruning is not done carefully, finding top-n outliers could result in the same amount of computation as finding LOF for all objects. In this paper, we propose a novel method to efficiently find the top-n local outliers in large databases. The concept of "micro-cluster" is introduced to compress the data. An efficient micro-cluster-based local outlier mining algorithm is designed based on this concept. As our algorithm can be adversely affected by the overlapping in the micro-clusters, we proposed a meaningful cut-plane solution for overlapping data. The formal analysis and experiments show that this method can achieve good performance in finding the most outstanding local outliers.