Frequent Subtree Mining - An Overview

Authors:
Yun Chi;Richard R. Muntz;Siegfried Nijssen;Joost N. Kok
Affiliations:
Department of Computer Science, University of California, Los Angeles, CA 90095, USA. ychi@cs.ucla.edu;Department of Computer Science, University of California, Los Angeles, CA 90095, USA. muntz@cs.ucla.edu;(Correspd.) Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands. snijssen@liacs.nl;Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands. joost@liacs.nl
Venue:
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Year:
2005

Citing 31
Cited 10

Constant time generation of free trees

SIAM Journal on Computing
O(n2.5) time algorithms for the subgraph homeomorphism problem on trees

Journal of Algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Discovering typical structures of documents: a road map approach

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Faster Subtree Isomorphism

Journal of Algorithms
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
Algorithms on Trees and Graphs

Algorithms on Trees and Graphs
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
ATreeGrep: Approximate Searching in Unordered Trees

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Aggregated Multicast - A Comparative Study

NETWORKING '02 Proceedings of the Second International IFIP-TC6 Networking Conference on Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; and Mobile and Wireless Communications
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Frequent Quer Patterns from XML Queries

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Indexing and Mining Free Trees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Efficient Data Mining for Maximal Frequent Subtrees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent free tree discovery in graph data

Proceedings of the 2004 ACM symposium on Applied computing
Unordered Tree Mining with Applications to Phylogeny

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees

IEEE Transactions on Knowledge and Data Engineering

Frequent subgraph mining in outerplanar graphs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Fuzzy Tree Mining: Go Soft on Your Nodes

IFSA '07 Proceedings of the 12th international Fuzzy Systems Association world congress on Foundations of Fuzzy Logic and Soft Computing
FTMnodes: Fuzzy tree mining based on partial inclusion

Fuzzy Sets and Systems
A statistical interestingness measures for XML based association rules

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Tree representation of digital picture embeddings

Journal of Visual Communication and Image Representation
An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Mining Induced/Embedded Subtrees using the Level of Embedding Constraint

Fundamenta Informaticae
Discovering interesting information with advances in web technology

ACM SIGKDD Explorations Newsletter
A polynomial-time maximum common subgraph algorithm for outerplanar graphs and its application to chemoinformatics

Annals of Mathematics and Artificial Intelligence
Mining of closed frequent subtrees from frequently updated databases

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining frequent subtrees from databases of labeled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. These applications share a requirement for the more expressive power of labeled trees to capture the complex relations among data entities. Although frequent subtree mining is a more difficult task than frequent itemset mining, most existing frequent subtree mining algorithms borrow techniques from the relatively mature association rule mining area. This paper provides an overview of a broad range of tree mining algorithms. We focus on the common theoretical foundations of the current frequent subtree mining algorithms and their relationship with their counterparts in frequent itemset mining. When comparing the algorithms, we categorize them according to their problem definitions and the techniques employed for solving various subtasks of the subtree mining problem. In addition, we also present a thorough performance study for a representative family of algorithms.