Tree model guided candidate generation for mining frequent subtrees from XML documents

Authors:
Henry Tan;Fedja Hadzic;Tharam S. Dillon;Elizabeth Chang;Ling Feng
Affiliations:
Univ. of Technology Sydney, Australia;Univ. of Technology Sydney, Australia;Univ. of Technology Sydney, Australia;Curtin Univ. of Technology, Perth, Australia;Tsinghua Univ., China
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2008

Citing 27
Cited 11

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Discovering typical structures of documents: a road map approach

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
DTD inference for views of XML data

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A survey in indexing and searching XML documents

Journal of the American Society for Information Science and Technology - XML
Semistructured data and XML

Information organization and databases
Optimized Substructure Discovery for Semi-structured Data

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Data Mining for Maximal Frequent Subtrees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Extracting association rules from XML documents using XQuery

WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent free tree discovery in graph data

Proceedings of the 2004 ACM symposium on Applied computing
HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
An Efficient Algorithm for Discovering Frequent Subgraphs

IEEE Transactions on Knowledge and Data Engineering
XAR-miner: efficient association rules mining for XML data

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
Protein Ontology: Vocabulary for Protein Data

ICITA '05 Proceedings of the Third International Conference on Information Technology and Applications (ICITA'05) Volume 2 - Volume 02
Cache-conscious frequent pattern mining on a modern processor

VLDB '05 Proceedings of the 31st international conference on Very large data bases
TRIPS and TIDES: new algorithms for tree mining

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Efficient mining of XML query patterns for caching

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
An XML-enabled data mining query language: XML-DMQL

International Journal of Business Intelligence and Data Mining
Mining interesting XML-enabled association rules with templates

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Mining Unordered Distance-Constrained Embedded Subtrees

DS '08 Proceedings of the 11th International Conference on Discovery Science
U3 - Mning Unordered Embedded Subtrees Using TMG Candidate Generation

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Mining association rules in tree structured XML data

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Semantic Wiki as a Basis for Software Engineering Ontology Evolution

OTM '09 Proceedings of the Confederated International Workshops and Posters on On the Move to Meaningful Internet Systems: ADI, CAMS, EI2N, ISDE, IWSSA, MONET, OnToContent, ODIS, ORM, OTM Academy, SWWS, SEMELS, Beyond SAWSDL, and COMBEK 2009
A statistical interestingness measures for XML based association rules

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Model guided algorithm for mining unordered embedded subtrees

Web Intelligence and Agent Systems
Data mining and model trees study on GDP and its influence factors

AIASABEBI'11 Proceedings of the 11th WSEAS international conference on Applied informatics and communications, and Proceedings of the 4th WSEAS International conference on Biomedical electronics and biomedical informatics, and Proceedings of the international conference on Computational engineering in systems applications
A structure preserving flat data format representation for tree-structured data

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Mining Induced/Embedded Subtrees using the Level of Embedding Constraint

Fundamenta Informaticae
Application of tree-structured data mining for analysis of process logs in XML format

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Mining of closed frequent subtrees from frequently updated databases

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly concerned with mining frequent induced and embedded ordered subtrees. Our main contributions are as follows. We describe our unique embedding list representation of the tree structure, which enables efficient implementation of our Tree Model Guided (TMG) candidate generation. TMG is an optimal, nonredundant enumeration strategy that enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach. In this article, we propose two algorithms, MB3-Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the maximum level of embedding constraint. Our experiments with both synthetic and real datasets against two well-known algorithms for mining induced and embedded subtrees, demonstrate the effectiveness and the efficiency of the proposed techniques.