The deep coalescence consensus tree problem is Pareto on clusters

Authors:
Harris T. Lin;J. Gordon Burleigh;Oliver Eulenstein
Affiliations:
Department of Computer Science, Iowa State University, Ames, IA;National Evolutionary Synthesis Center, Durham, NC and University of Florida, Gainesville, FL;Department of Computer Science, Iowa State University, Ames, IA
Venue:
ISBRA'11 Proceedings of the 7th international conference on Bioinformatics research and applications
Year:
2011

Citing 3
Cited 1

BEST

Bioinformatics
STEM

Bioinformatics
From Gene Trees to Species Trees II: Species Tree Inference by Minimizing Deep Coalescence Events

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Mathematical Properties of the Deep Coalescence Cost

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Phylogenetic methods must account for the biological processes that create incongruence between gene trees and the species phylogeny. Deep coalescence, or incomplete lineage sorting creates discord among gene trees at the early stages of species divergence or in cases when the time between speciation events was short and the ancestral population sizes were large. The deep coalescence problem takes a collection of gene trees and seeks the species tree that implies the fewest deep coalescence events, or the smallest deep coalescence reconciliation cost. Although this approach can to be useful for phylogenetics, the consensus properties of this problem are largely uncharacterized, and the accuracy of heuristics is untested. We prove that the deep coalescence consensus tree problem satisfies the highly desirable Pareto property for clusters (clades). That is, in all instances, each cluster that is present in all of the input gene trees, called a consensus cluster, will also be found in every optimal solution. We introduce an efficient algorithm that, given a candidate species tree that does not display the consensus clusters, will modify the candidate tree so that it includes all of the clusters and has a lower (more optimal) deep coalescence cost. Simulation experiments demonstrate the efficacy of this algorithm, but they also indicate that even with large trees, most solutions returned by the recent efficient heuristic display the consensus clusters.