Enhancing minimum spanning tree-based clustering by removing density-based outliers

Authors:
Xiaochun Wang;Xia Li Wang;Cong Chen;D. Mitchell Wilkes
Affiliations:
Xian Jiaotong University, Peoples Republic of China;Changan University, Peoples Republic of China;Xian Jiaotong University, Peoples Republic of China;Vanderbilt University, USA
Venue:
Digital Signal Processing
Year:
2013

Citing 34
Cited 0

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Introduction to algorithms

Introduction to algorithms
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing

Proceedings of the 27th International Conference on Very Large Data Bases
Enhancing Effectiveness of Outlier Detections for Low Density Patterns

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
On Local Spatial Outliers

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
Toward Objective Evaluation of Image Segmentation Algorithms

IEEE Transactions on Pattern Analysis and Machine Intelligence
A local-density based spatial clustering algorithm with noise

Information Systems
Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters

IEEE Transactions on Computers
Multidimensional Binary Search Trees in Database Applications

IEEE Transactions on Software Engineering
Outlier Detection with Kernel Density Functions

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Pattern Recognition, Fourth Edition

Pattern Recognition, Fourth Edition
Exploration of configural representation in landmark learning using working memory toolkit

Pattern Recognition Letters
On the Equivalence of Cohen's Kappa and the Hubert-Arabie Adjusted Rand Index

Journal of Classification
Minimum spanning tree based one-class classifier

Neurocomputing
A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A Divide-and-Conquer Approach for Minimum Spanning Tree-Based Clustering

IEEE Transactions on Knowledge and Data Engineering
A graph-theoretical clustering method based on two rounds of minimum spanning trees

Pattern Recognition
iPoc: a polar coordinate based indexing method for nearest neighbor search in high dimensional space

WAIM'10 Proceedings of the 11th international conference on Web-age information management
A neighborhood density estimation clustering algorithm based on minimum spanning tree

RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
Minimum spanning tree based split-and-merge: A hierarchical clustering method

Information Sciences: an International Journal
Exploiting sparse representations in very high-dimensional feature spaces obtained from patch-based processing

Machine Vision and Applications
Robust data clustering by learning multi-metric Lq-norm distances

Expert Systems with Applications: An International Journal
The Effect of Cluster Size, Dimensionality, and the Number of Clusters on Recovery of True Cluster Structure

IEEE Transactions on Pattern Analysis and Machine Intelligence
A nonparametric outlier detection for effectively discovering top-n outliers from engineering data

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Ranking outliers using symmetric neighborhood relationship

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A minimum spanning tree-inspired clustering-based outlier detection technique

ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional minimum spanning tree-based clustering algorithms only make use of information about edges contained in the tree to partition a data set. As a result, with limited information about the structure underlying a data set, these algorithms are vulnerable to outliers. To address this issue, this paper presents a simple while efficient MST-inspired clustering algorithm. It works by finding a local density factor for each data point during the construction of an MST and discarding outliers, i.e., those whose local density factor is larger than a threshold, to increase the separation between clusters. This algorithm is easy to implement, requiring an implementation of iDistance as the only k-nearest neighbor search structure. Experiments performed on both small low-dimensional data sets and large high-dimensional data sets demonstrate the efficacy of our method.