Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees

Authors:
Yun Chi;Yi Xia;Yirong Yang;Richard R. Muntz
Affiliations:
IEEE;-;-;IEEE
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2005

Citing 18
Cited 34

On the complexity of comparing evolutionary trees

Discrete Applied Mathematics - Special volume on computational molecular biology
Discovering typical structures of documents: a road map approach

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Aggregated Multicast - A Comparative Study

NETWORKING '02 Proceedings of the Second International IFIP-TC6 Networking Conference on Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; and Mobile and Wireless Communications
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Frequent Quer Patterns from XML Queries

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Indexing and Mining Free Trees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Efficient Data Mining for Maximal Frequent Subtrees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent free tree discovery in graph data

Proceedings of the 2004 ACM symposium on Applied computing
Unordered Tree Mining with Applications to Phylogeny

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Canonical forms for labelled trees and their applications in frequent subtree mining

Knowledge and Information Systems

Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Searching for high-support itemsets in itemset trees

Intelligent Data Analysis
Discovering Frequent Agreement Subtrees from Phylogenetic Data

IEEE Transactions on Knowledge and Data Engineering
Efficient mining of frequent closed XML query pattern

Journal of Computer Science and Technology
Using back-propagation to learn association rules for service personalization

Expert Systems with Applications: An International Journal
Clustering of Leaf-Labelled Trees

ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
Comprehensive isomorphic subtree enumeration

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Finding Frequent Patterns from Compressed Tree-Structured Data

DS '08 Proceedings of the 11th International Conference on Discovery Science
Mining Mutually Dependent Ordered Subtrees in Tree Databases

New Frontiers in Applied Data Mining
Efficient rule based structural algorithms for classification of tree structured data

Intelligent Data Analysis
Tree mining: Equivalence classes for candidate generation

Intelligent Data Analysis
Quantitative analysis of treebanks using frequent subtree mining methods

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Efficiently mining closed constrained frequent ordered subtrees by using border information

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mining closed frequent free trees in graph databases

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Mining induced and embedded subtrees in ordered, unordered, and partially-ordered trees

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Fixed-Parameter Tractability of the Maximum Agreement Supertree Problem

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Authorship classification: a syntactic tree mining approach

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Mining structured data

IEEE Computational Intelligence Magazine
MARGIN: Maximal frequent subgraph mining

ACM Transactions on Knowledge Discovery from Data (TKDD)
POTMiner: mining ordered, unordered, and partially-ordered trees

Knowledge and Information Systems
NDPMine: efficiently mining discriminative numerical features for pattern-based classification

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Frequent tree pattern mining: A survey

Intelligent Data Analysis
Varro: an algorithm and toolkit for regular structure discovery in treebanks

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Mining frequent closed graphs on evolving data streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
PrefixTreeESpan: a pattern growth algorithm for mining embedded subtrees

WISE'06 Proceedings of the 7th international conference on Web Information Systems
Mining maximum frequent access patterns in web logs based on unique labeled tree

WISE'06 Proceedings of the 7th international conference on Web Information Systems
A simple yet efficient approach for maximal frequent subtrees extraction from a collection of XML documents

WISE'06 Proceedings of the 7th international conference on Web Information Systems
Mining application repository to recommend XML configuration snippets

Proceedings of the 34th International Conference on Software Engineering
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Fixed-parameter tractability of the maximum agreement supertree problem

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Nearly exact mining of frequent trees in large networks

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Application of tree-structured data mining for analysis of process logs in XML format

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Integrating deep learning based perception with probabilistic logic via frequent pattern mining

AGI'13 Proceedings of the 6th international conference on Artificial General Intelligence
Key roles of closed sets and minimal generators in concise representations of frequent patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. One important problem in mining databases of trees is to find frequently occurring subtrees. Because of the combinatorial explosion, the number of frequent subtrees usually grows exponentially with the size of frequent subtrees and, therefore, mining all frequent subtrees becomes infeasible for large tree sizes. In this paper, we present CMTreeMiner, a computationally efficient algorithm that discovers only closed and maximal frequent subtrees in a database of labeled rooted trees, where the rooted trees can be either ordered or unordered. The algorithm mines both closed and maximal frequent subtrees by traversing an enumeration tree that systematically enumerates all frequent subtrees. Several techniques are proposed to prune the branches of the enumeration tree that do not correspond to closed or maximal frequent subtrees. Heuristic techniques are used to arrange the order of computation so that relatively expensive computation is avoided as much as possible. We study the performance of our algorithm through extensive experiments, using both synthetic data and data sets from real applications. The experimental results show that our algorithm is very efficient in reducing the search space and quickly discovers all closed and maximal frequent subtrees.