Frequent Subtree Mining - An Overview

  • Authors:
  • Yun Chi;Richard R. Muntz;Siegfried Nijssen;Joost N. Kok

  • Affiliations:
  • Department of Computer Science, University of California, Los Angeles, CA 90095, USA. ychi@cs.ucla.edu;Department of Computer Science, University of California, Los Angeles, CA 90095, USA. muntz@cs.ucla.edu;(Correspd.) Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands. snijssen@liacs.nl;Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands. joost@liacs.nl

  • Venue:
  • Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining frequent subtrees from databases of labeled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. These applications share a requirement for the more expressive power of labeled trees to capture the complex relations among data entities. Although frequent subtree mining is a more difficult task than frequent itemset mining, most existing frequent subtree mining algorithms borrow techniques from the relatively mature association rule mining area. This paper provides an overview of a broad range of tree mining algorithms. We focus on the common theoretical foundations of the current frequent subtree mining algorithms and their relationship with their counterparts in frequent itemset mining. When comparing the algorithms, we categorize them according to their problem definitions and the techniques employed for solving various subtasks of the subtree mining problem. In addition, we also present a thorough performance study for a representative family of algorithms.