Building species trees from larger parts of phylogenomic databases

Authors:
C. Scornavacca;V. Berry;V. Ranwez
Affiliations:
Center for Bioinformatics ZBIT, Tübingen University, Sand 14, 72076 Tübingen, Germany;LIRMM, CNRS -- Univ. Montpellier 2, 161 rue Ada, 34392 Montpellier Cedex 5, France;ISEM, CNRS -- Univ. Montpellier 2, Place E. Bataillon -- CC 064-34095 Montpellier, France
Venue:
Information and Computation
Year:
2011

Citing 12
Cited 3

Fast algorithms for finding nearest common ancestors

SIAM Journal on Computing
Regular Article: Extension Operations on Sets of Leaf-Labeled Trees

Advances in Applied Mathematics
New algorithms for the duplication-loss model

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
A supertree method for rooted trees

Discrete Applied Mathematics
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
From Gene Trees to Species Trees

SIAM Journal on Computing
Modified Mincut Supertrees

WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
The LCA Problem Revisited

LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Whole-genome prokaryotic phylogeny

Bioinformatics
Improved Parameterized Complexity of the Maximum Agreement Subtree and Maximum Compatible Tree Problems

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
From Gene Trees to Species Trees through a Supertree Approach

LATA '09 Proceedings of the 3rd International Conference on Language and Automata Theory and Applications
Simultaneous Identification of Duplications and Lateral Gene Transfers

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Algorithms for building consensus MUL-trees

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Generating functions for multi-labeled trees

Discrete Applied Mathematics
Extracting conflict-free information from multi-labeled trees

WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics

Quantified Score

Hi-index	0.01

Visualization

Abstract

Gene trees are leaf-labeled trees inferred from molecular sequences. Because of gene duplication events arising in genomes, some species host several copies of the same gene, hence individual gene trees usually have several leaves labeled with identical species names. Dealing with such multi-labeled gene trees (MUL trees) is a substantial problem in phylogenomics, e.g. current supertree methods do not handle MUL trees, which restricts studies aimed at building the Tree of Life to a very small core of mono-copy genes. We propose to tackle this problem by mainly transforming a collection of MUL trees into a collection of trees, each containing single copies of labels. To achieve that aim, we provide several fast algorithmic building stones and describe how they fit in a general framework to build a species tree. First, we propose to separately preprocess each MUL tree in order to remove its redundant parts with respect to speciation events. For this purpose, we present a tree isomorphism algorithm for MUL trees to reduce redundant parts of these trees. Second, we show how the speciation signal contained in a MUL tree can be represented by a linear set of triplets. When this set is topologically coherent (compatible), we show that it can be used to produce a single-copy gene tree to replace the MUL tree while preserving the information it contains on speciation events. As an alternative approach, we propose to extract from each MUL tree a maximum size subtree that is free of duplication events. The algorithms are finally applied in a supertree analysis of hogenom, a database of homologous genes from fully sequenced genomes.