Approximate minimum spanning tree clustering in high-dimensional space

  • Authors:
  • Chih Lai;Taras Rafa;Dwight E. Nelson

  • Affiliations:
  • Department of Biology, Graduate Programs in Software Engineering, University of St. Thomas, St. Paul, MN, USA. Tel.: +1 651 962 5573/ E-mai: {clai,trafa,denelson}@stthomas.edu;Department of Biology, Graduate Programs in Software Engineering, University of St. Thomas, St. Paul, MN, USA. Tel.: +1 651 962 5573/ E-mai: {clai,trafa,denelson}@stthomas.edu;Department of Biology, Graduate Programs in Software Engineering, University of St. Thomas, St. Paul, MN, USA. Tel.: +1 651 962 5573/ E-mai: {clai,trafa,denelson}@stthomas.edu

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Minimum spanning tree (MST) clustering sequentially inserts the nearest points in the R$^{d}$ space into a list which is then divided into clusters by using desired criteria. This insertion order, however, can be relaxed provided approximately nearby points in a condensed area are adjacently inserted into a list before distant points in other areas. Based on this observation, we propose an approximate clustering method in which a new Approximate MST (AMST) is repeatedly built in the maximum (d+1) iterations from two sources: a new Hilbert curve created from carefully shifted N data points, and a previous AMST which holds cumulative vicinity information derived from earlier iterations. Although the final AMST may not completely match to a true MST built from an $O(N^{2})$ algorithm, most mismatches occur locally within individual data groups which are unimportant for clustering. Our experiments on synthetic datasets and animal motion vectors extracted from surveillance videos show that high-quality clusters can be efficiently obtained from this approximation method.