Discovering Frequent Agreement Subtrees from Phylogenetic Data

Authors:
Sen Zhang;Jason T. L. Wang
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2008

Citing 20
Cited 6

On the agreement of many trees

Information Processing Letters
Maximum Agreement Subtree in a Set of Evolutionary Trees: Metrics and Efficient Algorithms

SIAM Journal on Computing
An Algorithm for Finding the Largest Approximately Common Substructures of Two Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
A supertree method for rooted trees

Discrete Applied Mathematics
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Efficient Data Mining for Maximal Frequent Subtrees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
State of the art of graph-based data mining

ACM SIGKDD Explorations Newsletter
Unordered Tree Mining with Applications to Phylogeny

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Phylogenetic Networks: Modeling, Reconstructibility, and Accuracy

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees

IEEE Transactions on Knowledge and Data Engineering
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
Canonical forms for labelled trees and their applications in frequent subtree mining

Knowledge and Information Systems
Improved Parameterized Complexity of the Maximum Agreement Subtree and Maximum Compatible Tree Problems

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
PhyloMiner: A Tool for Evolutionary Data Analysis

SSDBM '06 Proceedings of the 18th International Conference on Scientific and Statistical Database Management
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Efficient mining of XML query patterns for caching

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Mining globally distributed frequent subgraphs in a single labeled graph

Data & Knowledge Engineering
POTMiner: mining ordered, unordered, and partially-ordered trees

Knowledge and Information Systems
Frequent tree pattern mining: A survey

Intelligent Data Analysis
A new logic correlation rule for HIV-1 protease mutation

Expert Systems with Applications: An International Journal
Identifying rogue taxa through reduced consensus: NP-Hardness and exact algorithms

ISBRA'12 Proceedings of the 8th international conference on Bioinformatics Research and Applications
Mining of closed frequent subtrees from frequently updated databases

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study a new data mining problem concerning the discovery of frequent agreement subtrees (FASTs) from a set of phylogenetic trees. A phylogenetic tree, or phylogeny, is an unordered tree in which the order among siblings is unimportant. Furthermore, each leaf in the tree has a label representing a taxon (species or organism) name whereas internal nodes are unlabeled. The tree may have a root, representing the common ancestor of all species in the tree, or may be unrooted. An unrooted phylogeny arises due to the lack of sufficient evidence to infer a common ancestor of the taxa in the tree. The FAST problem addressed here is a natural extension of the MAST (maximum agreement subtree) problem widely studied in the computational phylogenetics community. The paper establishes a framework for tackling the FAST problem for both rooted and unrooted phylogenetic trees using data mining techniques. We first develop a novel canonical form for rooted trees together with a phylogeny-aware tree expansion scheme for generating candidate subtrees level by level. Then we present an efficient heuristic to find all frequent agreement subtrees in a given set of rooted trees, through an Apriori-like approach. We show the correctness and completeness of the proposed method. Finally we discuss extensions of the techniques to unrooted trees. Experimental results demonstrate that the proposed methods work well, capable of finding interesting patterns in both synthetic data and real phylogenetic trees.