Tree-structured Partitioning Based on Splitting Histograms of Distances

  • Authors:
  • Longin Jan Latecki;Rajagopal Venugopal;Marc Sobel;Steve Horvath

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a novel clustering algorithm that is similar in spiritto classification trees. The data is recursively split using a criterionthat applies a discrete curve evolution method to the histogramof distances. The algorithm can be depicted throughtree diagrams with triple splits. Leaf nodes represent eitherclusters or sets of observations that can not yet be clearly assignedto a cluster. After constructing the tree, unclassified datapoints are mapped to their closest clusters. The algorithm hasseveral advantages. First, it deals effectively with observationsthat can not be unambiguously assigned to a cluster by allowinga "margin of error". Second, it automatically determinesthe number of clusters; apart from the margin of error the useronly needs to specify the minimal cluster size but not the numberof clusters. Third, it is linear with respect to the number ofdata points and thus suitable for very large data sets. Experimentsinvolving both simulated and real data from differentdomains show that the proposed method is effective and efficient.