Canonical forms for labelled trees and their applications in frequent subtree mining

Authors:
Yun Chi;Yirong Yang;Richard R. Muntz
Affiliations:
Department of Computer Science, University of California, 90095, Los Angeles, CA, USA;Department of Computer Science, University of California, 90095, Los Angeles, CA, USA;Department of Computer Science, University of California, 90095, Los Angeles, CA, USA
Venue:
Knowledge and Information Systems
Year:
2005

Citing 0
Cited 20

Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees

IEEE Transactions on Knowledge and Data Engineering
Mining tree queries in a graph

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Frequent subgraph mining in outerplanar graphs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering Frequent Agreement Subtrees from Phylogenetic Data

IEEE Transactions on Knowledge and Data Engineering
Discovery of Useful Patterns from Tree-Structured Documents with Label-Projected Database

ATC '08 Proceedings of the 5th international conference on Autonomic and Trusted Computing
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism

Proceedings of the VLDB Endowment
Mining Unordered Distance-Constrained Embedded Subtrees

DS '08 Proceedings of the 11th International Conference on Discovery Science
U3 - Mning Unordered Embedded Subtrees Using TMG Candidate Generation

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Classification of ductal tree structures in galactograms

ISBI'09 Proceedings of the Sixth IEEE international conference on Symposium on Biomedical Imaging: From Nano to Macro
Quantitative analysis of treebanks using frequent subtree mining methods

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams

Proceedings of the 2010 conference on Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams
Bottom-up discovery of clusters of maximal ranges in HTML trees for search engines results extraction

BIS'07 Proceedings of the 10th international conference on Business information systems
Mining maximal frequent subtrees with lists-based pattern-growth method

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Frequent subgraph mining in outerplanar graphs

Data Mining and Knowledge Discovery
POTMiner: mining ordered, unordered, and partially-ordered trees

Knowledge and Information Systems
Model guided algorithm for mining unordered embedded subtrees

Web Intelligence and Agent Systems
Varro: an algorithm and toolkit for regular structure discovery in treebanks

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Comparison of methods for classification of breast ductal branching patterns

IWDM'06 Proceedings of the 8th international conference on Digital Mammography
Extraction of interesting financial information from heterogeneous XML-Based data

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
A simple yet efficient approach for maximal frequent subtrees extraction from a collection of XML documents

WISE'06 Proceedings of the 7th international conference on Web Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we first present two canonical forms for labelled rooted unordered trees–the breadth-first canonical form (BFCF) and the depth-first canonical form (DFCF). Then the canonical forms are applied to the frequent subtree mining problem. Based on the BFCF, we develop a vertical mining algorithm, RootedTreeMiner, to discover all frequently occurring subtrees in a database of labelled rooted unordered trees. The RootedTreeMiner algorithm uses an enumeration tree to enumerate all (frequent) labelled rooted unordered subtrees. Next, we extend the definition of the DFCF to labelled free trees and present an Apriori-like algorithm, FreeTreeMiner, to discover all frequently occurring subtrees in a database of labelled free trees. Finally, we study the performance and the scalability of our algorithms through extensive experiments based on both synthetic data and datasets from real applications.