Fast algorithms for finding nearest common ancestors
SIAM Journal on Computing
Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Change detection in hierarchically structured information
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Meaningful change detection in structured data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The String-to-String Correction Problem
Journal of the ACM (JACM)
The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms on Trees and Graphs
Algorithms on Trees and Graphs
Evaluation of hierarchical clustering algorithms for document datasets
Proceedings of the eleventh international conference on Information and knowledge management
Clustering validity checking methods: part II
ACM SIGMOD Record
Comparing Hierarchical Data in External Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Enhanced word clustering for hierarchical text classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature Weighting in k-Means Clustering
Machine Learning
Cluster-oriented software development environment and its applications
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure
IEEE Transactions on Knowledge and Data Engineering
XRules: an effective structural classifier for XML data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Information Systems - Special issue on web data integration
An Efficient Algorithm to Compute Differences between Structured Documents
IEEE Transactions on Knowledge and Data Engineering
Cluster Analysis for Gene Expression Data: A Survey
IEEE Transactions on Knowledge and Data Engineering
Fast Detection of XML Structural Similarity
IEEE Transactions on Knowledge and Data Engineering
A New Distance for High Level RNA Secondary Structure Comparison
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Knowledge and Information Systems
Integrated Use of Expert Systems at the K-Tree Level
ACM SIGART Bulletin
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications
IEEE Transactions on Knowledge and Data Engineering
diffX: an algorithm to detect changes in multi-version XML documents
CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
Frequent Subtree Mining - An Overview
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
XCLS: a fast and effective clustering algorithm for heterogenous XML documents
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Creating Process-Agents incrementally by mining process asset library
Information Sciences: an International Journal
Hi-index | 0.00 |
Recently, tree structures have become a popular way for storing huge amount of data. Clustering these data can facilitate different operations such as storage, retrieval, rule extraction and processing. In this paper, we propose a novel and heuristic algorithm for clustering tree structured data, called TreeCluster. This algorithm considers a representative tree for each cluster. It differs significantly from the traditional methods based on computing tree edit distance. TreeCluster compares each input tree T only with the representative trees of clusters and as a result allows a significant reduction of the running time. We show the efficiency of TreeCluster in terms of time complexity. Furthermore, we empirically evaluate the effectiveness and accuracy of TreeCluster algorithm in comparison with the pervious works. Our experimental results show that TreeCluster improves some cluster quality measures such as intra-cluster similarity, inter-cluster similarity, DUNN and DB.