The stability of phylogenetic tree construction of the HIV-1 virus using genome-ordering data versus env gene data

  • Authors:
  • Anna Badimo;Anton Bergheim;Scott Hazelhurst;Maria Papathanasopolous;Lynn Morris

  • Affiliations:
  • School of Computer Science, University of the Witwatersrand, Johannesburg, Private Bag 3, 2050 Wits;School of Computer Science, University of the Witwatersrand, Johannesburg, Private Bag 3, 2050 Wits;School of Computer Science, University of the Witwatersrand, Johannesburg, Private Bag 3, 2050 Wits;AIDS Vaccine Research Unit, National Institute for Communicable Diseases, Private Bag X4, Sandringham, 2131 Johannesburg;AIDS Vaccine Research Unit, National Institute for Communicable Diseases, Private Bag X4, Sandringham, 2131 Johannesburg

  • Venue:
  • SAICSIT '03 Proceedings of the 2003 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technology
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Phylogenetics is difficult: many versions of the problem are NP-hard and HIV data in particular poses challenges. Different phylogenetic algorithms have been proposed; these can be broadly categorised into three groups: those that use some sort of distance measure on sequences; those that use a model of the number of evolutionary events (maximum parsimony); and maximum-likelihood approaches.Traditionally, phylogenetics is done on gene sequence data, and as such, algorithms have been designed with this purpose in mind. Recent advances in sequencing technology, however, has resulted in full genome sequences becoming increasingly accessible. Mass insertions, deletions, transversions and recombinations make traditional phylogenetic approaches ill-suited for analysing this type of data. In an attempt to address such data a number of authors have proposed an approach based on gene ordering.This paper is an experimental comparison of phylogenetic analysis performed on env data and full genome data using maximum likelihood and gene-order based algorithms on the same data. The purpose is to study the differences in what the algorithms predict, and to a lesser extent to evaluate their efficiency.We conclude that the trees constructed for the same set of data are different in structure. As full-genome analysis captures deep evolutionary events, and the env gene data analysis shallower events; the challenge remains in reconciling the results with a consistent model of evolution. Understanding which approach is the best to use has obvious importance for biologists, but is equally important for computer scientists in designing algorithms to understand the effect of trade-offs between correctness and efficiency in the design of heuristics.