Split-Order Distance for Clustering and Classification Hierarchies

  • Authors:
  • Qi Zhang;Eric Yi Liu;Abhishek Sarkar;Wei Wang

  • Affiliations:
  • Department of Computer Science, University of North Carolina at Chapel Hill,;Department of Computer Science, University of North Carolina at Chapel Hill,;Department of Computer Science, University of North Carolina at Chapel Hill,;Department of Computer Science, University of North Carolina at Chapel Hill,

  • Venue:
  • SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering and classification hierarchies are organizational structures of a set of objects. Multiple hierarchies may be derived over the same set of objects, which makes distance computation between hierarchies an important task. In this paper, we model the classification and clustering hierarchies as rooted, leaf-labeled, unordered trees. We propose a novel distance metric Split-Order distance to evaluate the organizational structure difference between two hierarchies over the same set of leaf objects. Split-Order distance reflects the order in which subsets of the tree leaves are differentiated from each other and can be used to explain the relationships between the leaf objects. We also propose an efficient algorithm for computing Split-Order distance between two trees in O (n 2 d 4) time, where n is the number of leaves, and d is the maximum number of children of any node. Our experiments on both real and synthetic data demonstrate the efficiency and effectiveness of our algorithm.