IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding

Authors:
Henry Tan;Tharam S. Dillon;Fedja Hadzic;Elizabeth Chang;Ling Feng
Affiliations:
Faculty of Information Technology, University of Technology Sydney, Sydney, Australia;Faculty of Information Technology, University of Technology Sydney, Sydney, Australia;Faculty of Information Technology, University of Technology Sydney, Sydney, Australia;School of Information System, Curtin University of Technology, Perth, Australia;Department of Computer Science, University of Twente, Enschede, Netherlands
Venue:
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2006

Citing 10
Cited 20

Discovering typical structures of documents: a road map approach

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Optimized Substructure Discovery for Semi-structured Data

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
An Efficient Algorithm for Discovering Frequent Subgraphs

IEEE Transactions on Knowledge and Data Engineering
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
Protein Ontology: Vocabulary for Protein Data

ICITA '05 Proceedings of the Third International Conference on Information Technology and Applications (ICITA'05) Volume 2 - Volume 02
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Efficient mining of XML query patterns for caching

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

TRIPS and TIDES: new algorithms for tree mining

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
An adaptive memory conscious approach for mining frequent trees: implications for multi-core architectures

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Tree model guided candidate generation for mining frequent subtrees from XML documents

ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining Unordered Distance-Constrained Embedded Subtrees

DS '08 Proceedings of the 11th International Conference on Discovery Science
U3 - Mning Unordered Embedded Subtrees Using TMG Candidate Generation

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Mining tree-structured data on multicore systems

Proceedings of the VLDB Endowment
Application of tree mining to matching of knowledge structures of decision tree type

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II
POTMiner: mining ordered, unordered, and partially-ordered trees

Knowledge and Information Systems
A statistical interestingness measures for XML based association rules

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Discovering concept mappings by similarity propagation among substructures

IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Frequent tree pattern mining: A survey

Intelligent Data Analysis
Model guided algorithm for mining unordered embedded subtrees

Web Intelligence and Agent Systems
How to use "classical" tree mining algorithms to find complex spatio-temporal patterns?

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Unification of protein data and knowledge sources

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Protein data sources management using semantics

ASWC'06 Proceedings of the First Asian conference on The Semantic Web
Integration of protein data sources through PO

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
A structure preserving flat data format representation for tree-structured data

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Mining Induced/Embedded Subtrees using the Level of Embedding Constraint

Fundamenta Informaticae

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tree mining has recently attracted a lot of interest in areas such as Bioinformatics, XML mining, Web mining, etc. We are mainly concerned with mining frequent induced and embedded subtrees. While more interesting patterns can be obtained when mining embedded subtrees, unfortunately mining such embedding relationships can be very costly. In this paper, we propose an efficient approach to tackle the complexity of mining embedded subtrees by utilizing a novel Embedding List representation, Tree Model Guided enumeration, and introducing the Level of Embedding constraint. Thus, when it is too costly to mine all frequent embedded subtrees, one can decrease the level of embedding constraint gradually up to 1, from which all the obtained frequent subtrees are induced subtrees. Our experiments with both synthetic and real datasets against two known algorithms for mining induced and embedded subtrees, FREQT and TreeMiner, demonstrate the effectiveness and the efficiency of the technique.