A minimum spanning tree-inspired clustering-based outlier detection technique

Authors:
Xiaochun Wang;Xia Li Wang;D. Mitch Wilkes
Affiliations:
School of Electronics and Information, Xi'an Jiaotong University, Xi'an, China;Department of Computer Science, Changan Univeristy, Xi'an, China;School of Engineering, Vanderbilt University, Nashville, TN
Venue:
ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
Year:
2012

Citing 24
Cited 1

Temporal sequence learning and data reduction for anomaly detection

ACM Transactions on Information and System Security (TISSEC)
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Two-phase clustering process for outliers detection

Pattern Recognition Letters
Mining top-n local outliers in large databases

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Enhancing Effectiveness of Outlier Detections for Low Density Patterns

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
Rule-based anomaly pattern detection for detecting disease outbreaks

Eighteenth national conference on Artificial intelligence
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A Survey of Outlier Detection Methodologies

Artificial Intelligence Review
On Local Spatial Outliers

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
Outlier detection in sensor networks

Proceedings of the 8th ACM international symposium on Mobile ad hoc networking and computing
Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters

IEEE Transactions on Computers
Angle-based outlier detection in high-dimensional data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
A Divide-and-Conquer Approach for Minimum Spanning Tree-Based Clustering

IEEE Transactions on Knowledge and Data Engineering
ODDC: outlier detection using distance distribution clustering

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Minimum spanning tree based spatial outlier mining and its applications

RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
Ranking outliers using symmetric neighborhood relationship

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Enhancing minimum spanning tree-based clustering by removing density-based outliers

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to its important applications in data mining, many techniques have been developed for outlier detection. In this paper, an efficient three-phase outlier detection technique. First, we modify the famous k-means algorithm for an efficient construction of a spanning tree which is very close to a minimum spanning tree of the data set. Second, the longest edges in the obtained spanning tree are removed to form clusters. Based on the intuition that the data points in small clusters may be most likely all outliers, they are selected and regarded as outlier candidates. Finally, density-based outlying factors, LOF, are calculated for potential outlier candidates and accessed to pinpoint the local outliers. Extensive experiments on real and synthetic data sets show that the proposed approach can efficiently identify global as well as local outliers for large-scale datasets with respect to the state-of-the-art methods.