Phylogenetic tree reconstruction with protein linkage

  • Authors:
  • Junjie Yu;Henry Chi Ming Leung;Siu Ming Yiu;Yong Zhang;Francis Y. L. Chin;Nathan Hobbs;Amy Y. X. Wang

  • Affiliations:
  • Department of Computer Science, The University of Hong Kong, Hong Kong;Department of Computer Science, The University of Hong Kong, Hong Kong;Department of Computer Science, The University of Hong Kong, Hong Kong;Department of Computer Science, The University of Hong Kong, Hong Kong;Department of Computer Science, The University of Hong Kong, Hong Kong;Institute for Interdisciplinary Information Sciences, Tsinghua University, China;Institute for Interdisciplinary Information Sciences, Tsinghua University, China

  • Venue:
  • ISBRA'12 Proceedings of the 8th international conference on Bioinformatics Research and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

When reconstructing a phylogenetic tree, one common representation for a species is a binary string indicating the existence of some selected genes/proteins. Up until now, all existing methods have assumed the existence of these genes/proteins to be independent. However, in most cases, this assumption is not valid. In this paper, we consider the reconstruction problem by taking into account the dependency of proteins, i.e. protein linkage. We assume that the tree structure and leaf sequences are given, so we need only to find an optimal assignment to the ancestral nodes. We prove that the Phylogenetic Tree Reconstruction with Protein Linkage (PTRPL) problem for three different versions of linkage distance is NP-complete. We provide an efficient dynamic programming algorithm to solve the general problem in O (4m ·n )4 and O (4m ·(m +n )) time (compared to the straight-forward O (4m ·m ·n ) and O (4m ·m 2 ·n ) time algorithm), depending on the versions of linkage distance used, where .. stands for the number of species and .. for the number of proteins, i.e. length of binary string. We also argue, by experiments, that trees with higher accuracy can be constructed by using linkage information than by using only hamming distance to measure the differences between the binary strings, thus validating the significance of linkage information.