TRIPS and TIDES: new algorithms for tree mining

Authors:
Shirish Tatikonda;Srinivasan Parthasarathy;Tahsin Kurc
Affiliations:
The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 17
Cited 11

Discovering typical structures of documents: a road map approach

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Optimized Substructure Discovery for Semi-structured Data

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Web Mining: Information and Pattern Discovery on the World Wide Web

ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent free tree discovery in graph data

Proceedings of the 2004 ACM symposium on Applied computing
PRIX: Indexing And Querying XML Using Prüfer Sequences

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
DRYADE: A New Approach for Discovering Closed Frequent Trees in Heterogeneous Tree Databases

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Aggregated Multicast—A Comparative Study

Cluster Computing
Cache-conscious frequent pattern mining on a modern processor

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient Mining of High Branching Factor Attribute Trees

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

An adaptive memory conscious approach for mining frequent trees: implications for multi-core architectures

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Tree model guided candidate generation for mining frequent subtrees from XML documents

ACM Transactions on Knowledge Discovery from Data (TKDD)
PCITMiner: prefix-based closed induced tree miner for finding closed induced frequent subtrees

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Tree mining: Equivalence classes for candidate generation

Intelligent Data Analysis
Mining tree-structured data on multicore systems

Proceedings of the VLDB Endowment
Tree pattern mining with tree automata constraints

Information Systems
POTMiner: mining ordered, unordered, and partially-ordered trees

Knowledge and Information Systems
Frequent tree pattern mining: A survey

Intelligent Data Analysis
Model guided algorithm for mining unordered embedded subtrees

Web Intelligence and Agent Systems
A structure preserving flat data format representation for tree-structured data

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Mining Induced/Embedded Subtrees using the Level of Embedding Constraint

Fundamenta Informaticae

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research in data mining has progressed from mining frequent itemsets to more general and structured patterns like trees and graphs. In this paper, we address the problem of frequent subtree mining that has proven to be viable in a wide range of applications such as bioinformatics, XML processing, computational linguistics, and web usage mining. We propose novel algorithms to mine frequent subtrees from a database of rooted trees. We evaluate the use of two popular sequential encodings of trees to systematically generate and evaluate the candidate patterns. The proposed approach is very generic and can be used to mine embedded or induced subtrees that can be labeled, unlabeled, ordered, unordered, or edge-labeled. Our algorithms are highly cache-conscious in nature because of the compact and simple array-based data structures we use. Typically, L1 and L2 hit rates above 99% are observed. Experimental evaluation showed that our algorithms can achieve up to several orders of magnitude speedup on real datasets when compared to state-of-the-art tree mining algorithms.