A heuristic algorithm for clustering rooted ordered trees

  • Authors:
  • Mostafa Haghir Chehreghani;Masoud Rahgozar;Caro Lucas;Morteza Haghir Chehreghani

  • Affiliations:
  • (Correspd.) Database Research Group, Faculty of ECE, School of Engineering, University of Tehran, Tehran, Iran. E-mail: m.haghir@ece.ut.ac.ir;Database Research Group, Control and Intelligent Processing Center Of Excellence, Faculty of ECE, School of Engineering, University of Tehran, Tehran, Iran. E-mail: rahgozar@ut.ac.ir/ lucas@ipm.ir;Database Research Group, Control and Intelligent Processing Center Of Excellence, Faculty of ECE, School of Engineering, University of Tehran, Tehran, Iran. E-mail: rahgozar@ut.ac.ir/ lucas@ipm.ir;Department of CE, Sharif University of Technology, Tehran, Iran. E-mail: haghir@ce.sharif.edu

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, tree structures have become a popular way for storing huge amount of data. Clustering these data can facilitate different operations such as storage, retrieval, rule extraction and processing. In this paper, we propose a novel and heuristic algorithm for clustering tree structured data, called TreeCluster. This algorithm considers a representative tree for each cluster. It differs significantly from the traditional methods based on computing tree edit distance. TreeCluster compares each input tree T only with the representative trees of clusters and as a result allows a significant reduction of the running time. We show the efficiency of TreeCluster in terms of time complexity. Furthermore, we empirically evaluate the effectiveness and accuracy of TreeCluster algorithm in comparison with the pervious works. Our experimental results show that TreeCluster improves some cluster quality measures such as intra-cluster similarity, inter-cluster similarity, DUNN and DB.