Canonical forms for labelled trees and their applications in frequent subtree mining

  • Authors:
  • Yun Chi;Yirong Yang;Richard R. Muntz

  • Affiliations:
  • Department of Computer Science, University of California, 90095, Los Angeles, CA, USA;Department of Computer Science, University of California, 90095, Los Angeles, CA, USA;Department of Computer Science, University of California, 90095, Los Angeles, CA, USA

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we first present two canonical forms for labelled rooted unordered trees–the breadth-first canonical form (BFCF) and the depth-first canonical form (DFCF). Then the canonical forms are applied to the frequent subtree mining problem. Based on the BFCF, we develop a vertical mining algorithm, RootedTreeMiner, to discover all frequently occurring subtrees in a database of labelled rooted unordered trees. The RootedTreeMiner algorithm uses an enumeration tree to enumerate all (frequent) labelled rooted unordered subtrees. Next, we extend the definition of the DFCF to labelled free trees and present an Apriori-like algorithm, FreeTreeMiner, to discover all frequently occurring subtrees in a database of labelled free trees. Finally, we study the performance and the scalability of our algorithms through extensive experiments based on both synthetic data and datasets from real applications.