Uncovering Hidden Phylogenetic Consensus in Large Data Sets

Authors:
Nicholas Pattengale;Andre Aberer;Krister Swenson;Alexandros Stamatakis;Bernard Moret
Affiliations:
Sandia National Laboratories, Albuquerque;TUM, Munich;University of Ottawa/Université du Québec à Montréal, Montreal;TUM, Munich;Ecole Polytechnique Federale de Lausanne
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2011

Citing 6
Cited 3

On the agreement of many trees

Information Processing Letters
Maximum Agreement Subtree in a Set of Evolutionary Trees: Metrics and Efficient Algorithms

SIAM Journal on Computing
Tree Reconstruction via a Closure Operation on Partial Splits

JOBIM '00 Selected papers from the First International Conference on Computational Biology, Biology, Informatics, and Mathematics
RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models

Bioinformatics
How Many Bootstrap Replicates Are Necessary?

RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
Uncovering hidden phylogenetic consensus

ISBRA'10 Proceedings of the 6th international conference on Bioinformatics Research and Applications

The kernel of maximum agreement subtrees

ISBRA'11 Proceedings of the 7th international conference on Bioinformatics research and applications
The Kernel of Maximum Agreement Subtrees

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Identifying rogue taxa through reduced consensus: NP-Hardness and exact algorithms

ISBRA'12 Proceedings of the 8th international conference on Bioinformatics Research and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many of the steps in phylogenetic reconstruction can be confounded by "rogue” taxa—taxa that cannot be placed with assurance anywhere within the tree, indeed, whose location within the tree varies with almost any choice of algorithm or parameters. Phylogenetic consensus methods, in particular, are known to suffer from this problem. In this paper, we provide a novel framework to define and identify rogue taxa. In this framework, we formulate a bicriterion optimization problem, the relative information criterion, that models the net increase in useful information present in the consensus tree when certain taxa are removed from the input data. We also provide an effective greedy heuristic to identify a subset of rogue taxa and use this heuristic in a series of experiments, with both pathological examples from the literature and a collection of large biological data sets. As the presence of rogue taxa in a set of bootstrap replicates can lead to deceivingly poor support values, we propose a procedure to recompute support values in light of the rogue taxa identified by our algorithm; applying this procedure to our biological data sets caused a large number of edges to move from "unsupported” to "supported” status, indicating that many existing phylogenies should be recomputed and reevaluated to reduce any inaccuracies introduced by rogue taxa. We also discuss the implementation issues encountered while integrating our algorithm into RAxML v7.2.7, particularly those dealing with scaling up the analyses. This integration enables practitioners to benefit from our algorithm in the analysis of very large data sets (up to 2,500 taxa and 10,000 trees, although we present the results of even larger analyses).