Discriminating between rate heterogeneity and interspecific recombination in DNA sequence alignments with phylogenetic factorial hidden Markov models

  • Authors:
  • Dirk Husmeier

  • Affiliations:
  • Biomathematics and Statistics Scotland Edinburgh EH9 3JZ, UK

  • Venue:
  • Bioinformatics
  • Year:
  • 2005

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: A recently proposed method for detecting recombination in DNA sequence alignments is based on the combination of hidden Markov models (HMMs) with phylogenetic trees. Although this method was found to detect breakpoints of recombinant regions more accurately than most existing techniques, it inherently fails to distinguish between recombination and rate variation. In the present paper, we propose to marry the phylogenetic tree to a factorial HMM (FHMM). The states of the first hidden chain represent tree topologies, whereas the states of the second independent hidden chain represent different global scaling factors of the branch lengths. Inference is done in terms of a hierarchical Bayesian model, where parameters and hidden states are sampled from the posterior distribution with Gibbs sampling. Results: We have tested the proposed model on various synthetic and real-world DNA sequence alignments. The simulation results suggest that as opposed to the standard phylogenetic HMM, the phylogenetic FHMM clearly distinguishes between recombination and rate heterogeneity and thereby avoids the prediction of spurious recombinant regions. Availability: The proposed method has been implemented in a MATLAB package that extends Kevin Murphy's HMM toolbox. Software and data used in our study are available from http://www.bioss.sari.ac.uk/~dirk/Supplements Contact: dirk@bioss.ac.uk