Indexing and Mining Free Trees

  • Authors:
  • Yun Chi;Yirong Yang;Richard R. Muntz

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tree structures are used extensively in domains such ascomputational biology, pattern recognition, computer networks,and so on. In this paper, we present an indexing techniquefor free trees and apply this indexing technique to theproblem of mining frequent subtrees. We first define a novelrepresentation, the canonical form, for rooted trees and extendthe definition to free trees. We also introduce anotherconcept, the canonical string, as a simpler representationfor free trees in their canonical forms. We then apply ourtree indexing technique to the frequent subtree mining problemand present FreeTreeMiner, a computationally efficientalgorithm that discovers all frequently occurring subtreesin a database of free trees. We study the performance andthe scalability of our algorithms through extensive experimentsbased on both synthetic data and datasets from tworeal applications: a dataset of chemical compounds and adataset of Internet multicast trees.