Approximate minimum spanning tree clustering in high-dimensional space

Authors:
Chih Lai;Taras Rafa;Dwight E. Nelson
Affiliations:
Department of Biology, Graduate Programs in Software Engineering, University of St. Thomas, St. Paul, MN, USA. Tel.: +1 651 962 5573/ E-mai: {clai,trafa,denelson}@stthomas.edu;Department of Biology, Graduate Programs in Software Engineering, University of St. Thomas, St. Paul, MN, USA. Tel.: +1 651 962 5573/ E-mai: {clai,trafa,denelson}@stthomas.edu;Department of Biology, Graduate Programs in Software Engineering, University of St. Thomas, St. Paul, MN, USA. Tel.: +1 651 962 5573/ E-mai: {clai,trafa,denelson}@stthomas.edu
Venue:
Intelligent Data Analysis
Year:
2009

Citing 17
Cited 0

Gray Codes for Partial Match and Range Queries

IEEE Transactions on Software Engineering
Fractals for secondary key retrieval

PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A comparison of spatial query processing techniques for native and parameter spaces

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
A lower bound for randomized algebraic decision trees

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Advanced database systems

Advanced database systems
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Computational Geometry in C

Computational Geometry in C
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve

IEEE Transactions on Knowledge and Data Engineering
High Dimensional Similarity Search With Space Filling Curves

Proceedings of the 17th International Conference on Data Engineering
Incremental Clustering for Mining in a Data Warehousing Environment

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
Minimum Spanning Tree Partitioning Algorithm for Microaggregation

IEEE Transactions on Knowledge and Data Engineering
A Shrinking-Based Clustering Approach for Multidimensional Data

IEEE Transactions on Knowledge and Data Engineering
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Mining motion patterns using color motion map clustering

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Minimum spanning tree (MST) clustering sequentially inserts the nearest points in the R$^{d}$ space into a list which is then divided into clusters by using desired criteria. This insertion order, however, can be relaxed provided approximately nearby points in a condensed area are adjacently inserted into a list before distant points in other areas. Based on this observation, we propose an approximate clustering method in which a new Approximate MST (AMST) is repeatedly built in the maximum (d+1) iterations from two sources: a new Hilbert curve created from carefully shifted N data points, and a previous AMST which holds cumulative vicinity information derived from earlier iterations. Although the final AMST may not completely match to a true MST built from an $O(N^{2})$ algorithm, most mismatches occur locally within individual data groups which are unimportant for clustering. Our experiments on synthetic datasets and animal motion vectors extracted from surveillance videos show that high-quality clusters can be efficiently obtained from this approximation method.