A heuristic algorithm for clustering rooted ordered trees

Authors:
Mostafa Haghir Chehreghani;Masoud Rahgozar;Caro Lucas;Morteza Haghir Chehreghani
Affiliations:
(Correspd.) Database Research Group, Faculty of ECE, School of Engineering, University of Tehran, Tehran, Iran. E-mail: m.haghir@ece.ut.ac.ir;Database Research Group, Control and Intelligent Processing Center Of Excellence, Faculty of ECE, School of Engineering, University of Tehran, Tehran, Iran. E-mail: rahgozar@ut.ac.ir/ lucas@ipm.ir;Database Research Group, Control and Intelligent Processing Center Of Excellence, Faculty of ECE, School of Engineering, University of Tehran, Tehran, Iran. E-mail: rahgozar@ut.ac.ir/ lucas@ipm.ir;Department of CE, Sharif University of Technology, Tehran, Iran. E-mail: haghir@ce.sharif.edu
Venue:
Intelligent Data Analysis
Year:
2007

Citing 28
Cited 1

Fast algorithms for finding nearest common ancestors

SIAM Journal on Computing
Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Change detection in hierarchically structured information

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Meaningful change detection in structured data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The String-to-String Correction Problem

Journal of the ACM (JACM)
The Tree-to-Tree Correction Problem

Journal of the ACM (JACM)
XTRACT: a system for extracting document type descriptors from XML documents

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms on Trees and Graphs

Algorithms on Trees and Graphs
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Clustering validity checking methods: part II

ACM SIGMOD Record
Comparing Hierarchical Data in External Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Enhanced word clustering for hierarchical text classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature Weighting in k-Means Clustering

Machine Learning
Cluster-oriented software development environment and its applications

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure

IEEE Transactions on Knowledge and Data Engineering
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications

Information Systems - Special issue on web data integration
An Efficient Algorithm to Compute Differences between Structured Documents

IEEE Transactions on Knowledge and Data Engineering
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Fast Detection of XML Structural Similarity

IEEE Transactions on Knowledge and Data Engineering
A New Distance for High Level RNA Secondary Structure Comparison

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
On the use of hierarchical information in sequential mining-based XML document similarity computation

Knowledge and Information Systems
Integrated Use of Expert Systems at the K-Tree Level

ACM SIGART Bulletin
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
diffX: an algorithm to detect changes in multi-version XML documents

CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
XCLS: a fast and effective clustering algorithm for heterogenous XML documents

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Creating Process-Agents incrementally by mining process asset library

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, tree structures have become a popular way for storing huge amount of data. Clustering these data can facilitate different operations such as storage, retrieval, rule extraction and processing. In this paper, we propose a novel and heuristic algorithm for clustering tree structured data, called TreeCluster. This algorithm considers a representative tree for each cluster. It differs significantly from the traditional methods based on computing tree edit distance. TreeCluster compares each input tree T only with the representative trees of clusters and as a result allows a significant reduction of the running time. We show the efficiency of TreeCluster in terms of time complexity. Furthermore, we empirically evaluate the effectiveness and accuracy of TreeCluster algorithm in comparison with the pervious works. Our experimental results show that TreeCluster improves some cluster quality measures such as intra-cluster similarity, inter-cluster similarity, DUNN and DB.