Ranking outliers using symmetric neighborhood relationship

Authors:
Wen Jin;Anthony K. H. Tung;Jiawei Han;Wei Wang
Affiliations:
School of Computing Science, Simon Fraser University;Department of Computer Science, National University of Singapore;Department of Computer Science, Univ. of Illinois at Urbana-Champaign;Department of Computer Science, Fudan University
Venue:
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2006

Citing 21
Cited 27

Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Influence sets based on reverse nearest neighbor queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining top-n local outliers in large databases

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting graph-based spatial outliers: algorithms and applications (a summary of results)

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Enhancing Effectiveness of Outlier Detections for Low Density Patterns

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Mining Deviants in a Time Series Database

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Rule-based anomaly pattern detection for detecting disease outbreaks

Eighteenth national conference on Artificial intelligence
Mining Deviants in Time Series Data Streams

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Clustering objects on a spatial network

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Outlier Detection Using k-Nearest Neighbour Graph

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
AutoPart: parameter-free graph partitioning and outlier detection

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Aggregate Nearest Neighbor Queries in Road Networks

IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques

On efficient spatial matching

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
LDBOD: A novel local distribution based outlier detector

Pattern Recognition Letters
Neighborhood rough set based heterogeneous feature subset selection

Information Sciences: an International Journal
Angle-based outlier detection in high-dimensional data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier Detection Based on Voronoi Diagram

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
On efficient mutual nearest neighbor query processing in spatial databases

Data & Knowledge Engineering
Efficient mutual nearest neighbor query processing for moving object trajectories

Information Sciences: an International Journal
Enhancing effectiveness of density-based outlier mining scheme with density-similarity-neighbor-based outlier factor

Expert Systems with Applications: An International Journal
DBStrata: a system for density-based clustering and outlier detection based on stratification

Proceedings of the Fourth International Conference on SImilarity Search and APplications
Noisy data elimination using mutual k-nearest neighbor for classification mining

Journal of Systems and Software
Extended rough set-based attribute reduction in inconsistent incomplete decision systems

Information Sciences: an International Journal
Neighborhood rough sets for dynamic data mining

International Journal of Intelligent Systems
Attribute reduction of data with error ranges and test costs

Information Sciences: an International Journal
NMGRS: Neighborhood-based multigranulation rough sets

International Journal of Approximate Reasoning
A minimum spanning tree-inspired clustering-based outlier detection technique

ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
Algorithms for detecting outliers via clustering and ranks

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Continuous kernel-based outlier detection over distributed data streams

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Continuous adaptive outlier detection on distributed data streams

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Enhancing density-based clustering: Parameter reduction and outlier detection

Information Systems
Subsampling for efficient and effective unsupervised outlier detection ensembles

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Enhancing one-class support vector machines for unsupervised anomaly detection

Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description
Local and global scaling reduce hubs in space

The Journal of Machine Learning Research
Enhancing minimum spanning tree-based clustering by removing density-based outliers

Digital Signal Processing
Reverse-k-Nearest-Neighbor join processing

SSTD'13 Proceedings of the 13th international conference on Advances in Spatial and Temporal Databases
Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

Data Mining and Knowledge Discovery
Ensembles for unsupervised outlier detection: challenges and research questions a position paper

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining local outliers, i.e., the outliers that have density distribution significantly different from their neighborhood. The estimation of density distribution at the location of an object has so far been based on the density distribution of its k-nearest neighbors [2,11]. However, when outliers are in the location where the density distributions in the neighborhood are significantly different, for example, in the case of objects from a sparse cluster close to a denser cluster, this may result in wrong estimation. To avoid this problem, here we propose a simple but effective measure on local outliers based on a symmetric neighborhood relationship. The proposed measure considers both neighbors and reverse neighbors of an object when estimating its density distribution. As a result, outliers so discovered are more meaningful. To compute such local outliers efficiently, several mining algorithms are developed that detects top-n outliers based on our definition. A comprehensive performance evaluation and analysis shows that our methods are not only efficient in the computation but also more effective in ranking outliers.