HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms

  • Authors:
  • Yun Chi;Yirong Yang;Richard R. Muntz

  • Affiliations:
  • University of California, Los Angeles;University of California, Los Angeles;University of California, Los Angeles

  • Venue:
  • SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tree structures are used extensively in domains suchas computational biology, pattern recognition, XMLdatabases, computer networks, and so on. In this paper,we present HybridTreeMiner, a computationally efficientalgorithm that discovers all frequently occurringsubtrees in a database of rooted unordered trees. The algorithmmines frequent subtrees by traversing an enumerationtree that systematically enumerates all subtrees. The enumerationtree is defined based on a novel canonicalform for rooted unordered trees-the breadth-first canonicalform (BFCF). By extending the definitions of our canonicalform and enumeration tree to free trees, our algorithmcan efficiently handle databases of free trees as well.We study the performance of our algorithms through extensiveexperiments based on both synthetic data and datasetsfrom real applications. The experiments show that our algorithmis competitive in comparison to known rooted treemining algorithms and is faster by one to two orders ofmagnitudes compared to a known algorithm for mining frequentfree trees.