Mining Induced/Embedded Subtrees using the Level of Embedding Constraint

Authors:
Henry Tan;Fedja Hadzic;Tharam S. Dillon
Affiliations:
Department of Computing, Curtin University of Technology, Perth, Australia, fedja.hadzic@curtin.edu.au;(Correspd.) Department of Computing, Curtin University of Technology, Perth, Australia, fedja.hadzic@curtin.edu.au;Department of Computing, Curtin University of Technology, Perth, Australia, fedja.hadzic@curtin.edu.au
Venue:
Fundamenta Informaticae
Year:
2012

Citing 32
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Discovering typical structures of documents: a road map approach

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Semistructured data and XML

Information organization and databases
Using a Hash-Based Method with Transaction Trimming for Mining Association Rules

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Constraint-Based Rule Mining in Large, Dense Databases

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Efficient Data Mining for Maximal Frequent Subtrees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent free tree discovery in graph data

Proceedings of the 2004 ACM symposium on Applied computing
HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
An Efficient Algorithm for Discovering Frequent Subgraphs

IEEE Transactions on Knowledge and Data Engineering
XAR-miner: efficient association rules mining for XML data

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
Protein Ontology: Vocabulary for Protein Data

ICITA '05 Proceedings of the Third International Conference on Information Technology and Applications (ICITA'05) Volume 2 - Volume 02
AMIOT: Induced Ordered Tree Mining in Tree-Structured Databases

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
TRIPS and TIDES: new algorithms for tree mining

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Efficient mining of XML query patterns for caching

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
An XML-enabled data mining query language: XML-DMQL

International Journal of Business Intelligence and Data Mining
Tree model guided candidate generation for mining frequent subtrees from XML documents

ACM Transactions on Knowledge Discovery from Data (TKDD)
Substructure discovery using minimum description length and background knowledge

Journal of Artificial Intelligence Research
Mining interesting XML-enabled association rules with templates

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing need for representing information through more complex structures where semantics and relationships among data objects can be more easily expressed has resulted in many semi-structured data sources. Structure comparison among semi-structured data objects can often reveal valuable information, and hence tree mining has gained a considerable amount of interest in areas such as XML mining, Bioinformatics, Web mining etc. We are primarily concerned with the task of mining frequent ordered induced and embedded subtrees from a database of rooted ordered labeled trees. Our previous contributions consist of the efficient Tree Model Guided (TMG) candidate enumeration approach for which we developed a mathematical model that provides an estimate of the worst case complexity for embedded subtree mining. This potentially reveals computationally impractical situations where one would be forced to constrain the mining process in some way so that at least some patterns can be discovered. This motivated our strategy of tackling the complexity of mining embedded subtrees by introducing the Level of Embedding constraint. Thus, when it is too costly to mine all frequent embedded subtrees, one can decrease the level of embedding constraint gradually down to 1, from which all the obtained frequent subtrees are induced subtrees. In this paper we develop alternative implementations and propose two algorithms MB3-R and iMB3-R, which achieve better efficiency in terms of time and space. Furthermore, we develop a mathematical model for estimating the worst case complexity for induced subtree mining. It is accompanied with a theoretical analysis of induced-embedded subtree relationships in terms of complexity for frequent subtree mining. Using synthetic and real world data we practically demonstrate the space and time efficiency of our new approach and provide some comparisons to the two well know algorithms for mining induced and embedded subtrees.