A vertical distance-based outlier detection method with local pruning

Authors:
Dongmei Ren;Imad Rahal;William Perrizo;Kirk Scott
Affiliations:
North Dakota State University, Fargo, ND;North Dakota State University, Fargo, ND;North Dakota State University, Fargo, ND;The University of Alaska Anchorage, Anchorage, AK
Venue:
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Year:
2004

Citing 15
Cited 5

Reasoning about naming systems

ACM Transactions on Programming Languages and Systems (TOPLAS)
Constraint satisfaction and debugging for interactive user interfaces

Constraint satisfaction and debugging for interactive user interfaces
A study on video browsing strategies

A study on video browsing strategies
The cubic mouse: a new device for three-dimensional input

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
The P-tree algebra

Proceedings of the 2002 ACM symposium on Applied computing
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
OPTICS-OF: Identifying Local Outliers

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
k-nearest Neighbor Classification on Spatial Data Streams Using P-trees

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
An optimized approach for KNN text categorization using P-trees

Proceedings of the 2004 ACM symposium on Applied computing
DataMIME™

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Detecting graph-based spatial outliers

Intelligent Data Analysis

A constant factor approximation algorithm for k-median clustering with outliers

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Online spam-blog detection through blog search

Proceedings of the 17th ACM conference on Information and knowledge management
ODDC: outlier detection using distance distribution clustering

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Correlation-based detection of attribute outliers

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Detecting spam blogs from blog search results

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

"One person's noise is another person's signal". Outlier detection is used to clean up datasets and also to discover useful anomalies, such as criminal activities in electronic commerce, computer intrusion attacks, terrorist threats, agricultural pest infestations, etc. Thus, outlier detection is critically important in the information-based society. This paper focuses on finding outliers in large datasets using distance-based methods. First, to speedup outlier detections, we revise Knorr and Ng's distance-based outlier definition; second, a vertical data structure, instead of traditional horizontal structures, is adopted to facilitate efficient outlier detection further. We tested our methods against national hockey league dataset and show an order of magnitude of speed improvement compared to the contemporary distance-based outlier detection approaches.