Uncovering hidden phylogenetic consensus

  • Authors:
  • Nicholas D. Pattengale;Krister M. Swenson;Bernard M. E. Moret

  • Affiliations:
  • Department of Computer Science, University of New Mexico;Department of Mathematics and Statistics, University of Ottawa, Canada;Laboratory for Computational Biology and Bioinformatics, EPFL, Switzerland

  • Venue:
  • ISBRA'10 Proceedings of the 6th international conference on Bioinformatics Research and Applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many of the steps in phylogenetic reconstruction can be confounded by “rogue” taxa, taxa that cannot be placed with assurance anywhere within the tree—whose location within the tree, in fact, varies with almost any choice of algorithm or parameters. Phylogenetic consensus methods, in particular, are known to suffer from this problem. In this paper we provide a novel framework in which to define and identify rogue taxa. In this framework, we formulate a bicriterion optimization problem that models the net increase in useful information present in the consensus tree when certain taxa are removed from the input data. We also provide an effective greedy heuristic to identify a subset of rogue taxa and use it in a series of experiments, using both pathological examples described in the literature and a collection of large biological datasets. As the presence of rogue taxa in a set of bootstrap replicates can lead to deceivingly poor support values, we propose a procedure to recompute support values in light of the rogue taxa identified by our algorithm; applying this procedure to our biological datasets caused a large number of edges to change from “unsupported” to “supported” status, indicating that many existing phylogenies should be recomputed and reevaluated to reduce any inaccuracies introduced by rogue taxa.