Discovering typical structures of documents: a road map approach
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Optimized Substructure Discovery for Semi-structured Data
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Efficiently mining frequent trees in a forest
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Web Mining: Information and Pattern Discovery on the World Wide Web
ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
XRules: an effective structural classifier for XML data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent free tree discovery in graph data
Proceedings of the 2004 ACM symposium on Applied computing
PRIX: Indexing And Querying XML Using Prüfer Sequences
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
A quickstart in frequent structure mining can make a difference
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
DRYADE: A New Approach for Discovering Closed Frequent Trees in Heterogeneous Tree Databases
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Aggregated Multicast—A Comparative Study
Cluster Computing
Cache-conscious frequent pattern mining on a modern processor
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient Mining of High Branching Factor Attribute Trees
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Tree model guided candidate generation for mining frequent subtrees from XML documents
ACM Transactions on Knowledge Discovery from Data (TKDD)
PCITMiner: prefix-based closed induced tree miner for finding closed induced frequent subtrees
AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Tree mining: Equivalence classes for candidate generation
Intelligent Data Analysis
Mining tree-structured data on multicore systems
Proceedings of the VLDB Endowment
Tree pattern mining with tree automata constraints
Information Systems
POTMiner: mining ordered, unordered, and partially-ordered trees
Knowledge and Information Systems
Frequent tree pattern mining: A survey
Intelligent Data Analysis
Model guided algorithm for mining unordered embedded subtrees
Web Intelligence and Agent Systems
A structure preserving flat data format representation for tree-structured data
PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Mining Induced/Embedded Subtrees using the Level of Embedding Constraint
Fundamenta Informaticae
Hi-index | 0.00 |
Recent research in data mining has progressed from mining frequent itemsets to more general and structured patterns like trees and graphs. In this paper, we address the problem of frequent subtree mining that has proven to be viable in a wide range of applications such as bioinformatics, XML processing, computational linguistics, and web usage mining. We propose novel algorithms to mine frequent subtrees from a database of rooted trees. We evaluate the use of two popular sequential encodings of trees to systematically generate and evaluate the candidate patterns. The proposed approach is very generic and can be used to mine embedded or induced subtrees that can be labeled, unlabeled, ordered, unordered, or edge-labeled. Our algorithms are highly cache-conscious in nature because of the compact and simple array-based data structures we use. Typically, L1 and L2 hit rates above 99% are observed. Experimental evaluation showed that our algorithms can achieve up to several orders of magnitude speedup on real datasets when compared to state-of-the-art tree mining algorithms.