Heuristics for the gene-duplication problem: a Θ(n) speed-up for the local search

  • Authors:
  • Mukul S. Bansal;J. Gordon Burleigh;Oliver Eulenstein;André Wehe

  • Affiliations:
  • Department of Computer Science, Iowa State University, Ames, IA;National Evolutionary Synthesis Center, Durham, NC;Department of Computer Science, Iowa State University, Ames, IA;Department of Electrical and Computer Engineering, Iowa State University, Ames, IA

  • Venue:
  • RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The gene-duplication problem is to infer a species supertree from a collection of gene trees that are confounded by complex histories of gene duplications. This problem is NP-hard and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. We show how this local search problem can be solved efficiently by reusing previously computed information. This improves the running time of the current solution by a factor of n, where n is the number of species in the resulting supertree solution, and makes the gene-duplication problem more tractable for large-scale phylogenetic analyses. We verify the exceptional performance of our solution in a comparison study using sets of large randomly generated gene trees. Furthermore, we demonstrate the utility of our solution by incorporating large genomic data sets from GenBank into a supertree analysis of plants.