Integrating Sequence and Topology for Efficient and Accurate Detection of Horizontal Gene Transfer

  • Authors:
  • Cuong Than;Guohua Jin;Luay Nakhleh

  • Affiliations:
  • Department of Computer Science, Rice University, Houston, USA TX 77005;Department of Computer Science, Rice University, Houston, USA TX 77005;Department of Computer Science, Rice University, Houston, USA TX 77005

  • Venue:
  • RECOMB-CG '08 Proceedings of the international workshop on Comparative Genomics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

One phylogeny-based approach to horizontal gene transfer (HGT) detection entails comparing the topology of a gene tree to that of the species tree, and using their differences to locate HGT events. Another approach is based on augmenting a species tree into a phylogenetic network to improve the fitness of the evolution of the gene sequence data under an optimization criterion, such as maximum parsimony (MP). One major problem with the first approach is that gene tree estimates may have wrong branches, which result in false positive estimates of HGT events, and the second approach is accurate, yet suffers from the computational complexity of searching through the space of possible phylogenetic networks.The contributions of this paper are two-fold. First, we present a measure that computes the support of HGT events inferred from pairs of species and gene trees. The measure uses the bootstrap values of the gene tree branches. Second, we present an integrative method to speed up the approaches for augmenting species trees into phylogenetic networks.We conducted data analysis and performance study of our methods on a data set of 20 genes from the Amborellamitochondrial genome, in which Jeffrey Palmer and his co-workers postulated a massive amount of horizontal gene transfer. As expected, we found that including poorly supported gene tree branches in the analysis results in a high rate of false positive gene transfer events. Further, the bootstrap-based support measure assessed, with high accuracy, the support of the inferred gene transfer events. Further, we obtained very promising results, in terms of both speed and accuracy, when applying our integrative method on these data sets (we are currently studying the performance in extensive simulations). All methods have been implemented in the PhyloNet and NEPAL tools, which are available in the form of executable code from http://bioinfo.cs.rice.edu.