A simulation study comparing supertree and combined analysis methods using SMIDGen

  • Authors:
  • M. Shel Swenson;François Barbançon;C. Randal Linder;Tandy Warnow

  • Affiliations:
  • Department of Mathematics, The University of Texas at Austin;Microsoft, Redmond, WA;Section of Integrative Biology, The University of Texas at Austin;Department of Computer Sciences, The University of Texas at Austin

  • Venue:
  • WABI'09 Proceedings of the 9th international conference on Algorithms in bioinformatics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Supertree methods comprise one approach to reconstructing large molecular phylogenies given estimated source trees for overlapping subsets of the entire set of taxa. These source trees are combined into a single supertree on the full set of taxa using various algorithmic techniques, with the most common being matrix representation with parsimony (MRP). When the data allow, the competing approach is a combined analysis (also known as a "supermatrix" or "total evidence" approach) whereby the different sequence data matrices for each of the different subsets of taxa are concatenated into a single supermatrix, and a tree is estimated on that supermatrix. In this paper, we report an extensive simulation study comparing the supertree methods MRP and weighted MRP against combined analysis methods on large model trees, using a novel simulation methodology (Super-Method Input Data Generator, or SMIDGen), which better reflects biological processes and the practices of systematists. This study shows that combined analysis based upon maximum likelihood outperforms all the other methods, giving especially big improvements when the largest subtree does not contain most of the taxa.