On the agreement of many trees
Information Processing Letters
Maximum Agreement Subtree in a Set of Evolutionary Trees: Metrics and Efficient Algorithms
SIAM Journal on Computing
An Algorithm for Finding the Largest Approximately Common Substructures of Two Trees
IEEE Transactions on Pattern Analysis and Machine Intelligence
A supertree method for rooted trees
Discrete Applied Mathematics
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Efficient Data Mining for Maximal Frequent Subtrees
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Data Mining and Knowledge Discovery
CloseGraph: mining closed frequent graph patterns
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
State of the art of graph-based data mining
ACM SIGKDD Explorations Newsletter
Unordered Tree Mining with Applications to Phylogeny
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Phylogenetic Networks: Modeling, Reconstructibility, and Accuracy
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees
IEEE Transactions on Knowledge and Data Engineering
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications
IEEE Transactions on Knowledge and Data Engineering
Canonical forms for labelled trees and their applications in frequent subtree mining
Knowledge and Information Systems
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
PhyloMiner: A Tool for Evolutionary Data Analysis
SSDBM '06 Proceedings of the 18th International Conference on Scientific and Statistical Database Management
Frequent Subtree Mining - An Overview
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Efficient mining of XML query patterns for caching
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Mining globally distributed frequent subgraphs in a single labeled graph
Data & Knowledge Engineering
POTMiner: mining ordered, unordered, and partially-ordered trees
Knowledge and Information Systems
Frequent tree pattern mining: A survey
Intelligent Data Analysis
A new logic correlation rule for HIV-1 protease mutation
Expert Systems with Applications: An International Journal
Identifying rogue taxa through reduced consensus: NP-Hardness and exact algorithms
ISBRA'12 Proceedings of the 8th international conference on Bioinformatics Research and Applications
Mining of closed frequent subtrees from frequently updated databases
Intelligent Data Analysis
Hi-index | 0.00 |
We study a new data mining problem concerning the discovery of frequent agreement subtrees (FASTs) from a set of phylogenetic trees. A phylogenetic tree, or phylogeny, is an unordered tree in which the order among siblings is unimportant. Furthermore, each leaf in the tree has a label representing a taxon (species or organism) name whereas internal nodes are unlabeled. The tree may have a root, representing the common ancestor of all species in the tree, or may be unrooted. An unrooted phylogeny arises due to the lack of sufficient evidence to infer a common ancestor of the taxa in the tree. The FAST problem addressed here is a natural extension of the MAST (maximum agreement subtree) problem widely studied in the computational phylogenetics community. The paper establishes a framework for tackling the FAST problem for both rooted and unrooted phylogenetic trees using data mining techniques. We first develop a novel canonical form for rooted trees together with a phylogeny-aware tree expansion scheme for generating candidate subtrees level by level. Then we present an efficient heuristic to find all frequent agreement subtrees in a given set of rooted trees, through an Apriori-like approach. We show the correctness and completeness of the proposed method. Finally we discuss extensions of the techniques to unrooted trees. Experimental results demonstrate that the proposed methods work well, capable of finding interesting patterns in both synthetic data and real phylogenetic trees.